Hi, Also, we should update our SQL scripts for MySQL to use InnoDB rather than MyISAM. Where MyISAM takes a whole table lock when doing operations like deletes, where InnoDB only takes row locks.
Cheers, Anjana. On Tue, Sep 1, 2015 at 11:07 AM, Gokul Balakrishnan <[email protected]> wrote: > Hi Sinthuja/all, > > Seems the MySQL RDS instance we've been using to test has gone down. We'll > conduct the test again once this is rectified. > > Thanks, > > On 1 September 2015 at 11:00, Sinthuja Ragendran <[email protected]> > wrote: > >> Hi Anjana, >> >> >> On Mon, Aug 31, 2015 at 10:23 PM, Anjana Fernando <[email protected]> >> wrote: >> >>> Hi Sinthuja, >>> >>> For this, disable the indexing stuff and try the tests. Here, we are >>> just testing just the publishing of events to the server, indexing will add >>> more load to the database, and can make it have timeouts etc.. For this, we >>> can use a different record store for index staging part and so on. >>> >> >> Yeah, the latest test I ran was without indexing and with the CAPP >> maninda used for testing. Further, as mentioned i was able to see some >> times the publisher stops, and then resumes. At the time of publisher is >> stopped i got the threaddump which was attached in the last mail. There you >> can see the receiver queue is full, and worker threads are busy in >> inserting the records to DAL. I hope this is the same case in maninda's >> setup as well, but have to get a threaddump of DAS to confirm this. >> >> Thanks, >> Sinthuja. >> >> >>> Cheers, >>> Anjana. >>> >>> On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]> >>> wrote: >>> >>>> Hi Maninda, >>>> >>>> I did a test with MySQL, and I was able to publish 10M events. There >>>> were some hickups as I mentioned before, and I could see the receiver queue >>>> is full, and the event sink worker threads are writing to database. Please >>>> refer the attached threaddump which was taken when the publisher is paused >>>> due to this. Please do run the test from your side, and share your >>>> observation. >>>> >>>> Thanks, >>>> Sinthuja. >>>> >>>> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]> >>>> wrote: >>>> >>>>> Hi Maninda, >>>>> >>>>> I'll also test with MySQL in local machine at the mean time, >>>>> apparently I observed really high CPU usage from DAS and publisher was >>>>> normal, as other way around of your observation. Please include the sout >>>>> to >>>>> the agent code as discussed offline and share the results. >>>>> >>>>> Thanks, >>>>> Sinthuja. >>>>> >>>>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Sinthuja, >>>>>> >>>>>> I have used MySQL in RDS. And I have used a indexing disabled version >>>>>> of smart home CApp to isolate issues. Here I have attached it. So I could >>>>>> not see any error in DAS side and that may be the low CPU usage in DAS >>>>>> than >>>>>> in publisher comparing to your setup as we discussed offline. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>>> *Maninda Edirisooriya* >>>>>> Senior Software Engineer >>>>>> >>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>> >>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>> *E-mail* : [email protected] >>>>>> *Skype* : @manindae >>>>>> *Twitter* : @maninda >>>>>> >>>>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Hi Maninda, >>>>>>> >>>>>>> I tested this locally now, and I was able to see some hickups when >>>>>>> publishing. So at the point when the publisher is kind of paused >>>>>>> publishing, I started a new publisher, and that also succeeded only upto >>>>>>> the publisher's event queue becomes full, and then that publisher also >>>>>>> stopped pushing. Can you confirm that same behaviour was observed in >>>>>>> publisher? I think this have made you to think the publisher has become >>>>>>> hang state, but actually the receiver queue was full and it stops >>>>>>> accepting >>>>>>> the events further. >>>>>>> >>>>>>> And during that time, I was able to see multiple error logs in the >>>>>>> DAS side. Therefore I think the event persisting thread has become very >>>>>>> slow, and hence the this behaviour was observed. I have attached the DAS >>>>>>> threaddump, and I could see many threads are in blocked state on H2 >>>>>>> database. What is the database that you are using to test? I think >>>>>>> better >>>>>>> you try with MySQL, some other production recommended databases. >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> [2015-08-31 19:17:04,359] ERROR >>>>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} - >>>>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does >>>>>>> not >>>>>>> exist >>>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException: >>>>>>> [-1000:__INDEX_DATA__] does not exist >>>>>>> at >>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException: >>>>>>> Error in deleting records: Timeout trying to lock table >>>>>>> "ANX___8GIVT7RC_"; >>>>>>> SQL statement: >>>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>>>> at >>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>> at >>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock >>>>>>> table "ANX___8GIVT7RC_"; SQL statement: >>>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>>>> at >>>>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327) >>>>>>> at org.h2.message.DbException.get(DbException.java:167) >>>>>>> at org.h2.message.DbException.get(DbException.java:144) >>>>>>> at org.h2.table.RegularTable.doLock(RegularTable.java:466) >>>>>>> at org.h2.table.RegularTable.lock(RegularTable.java:404) >>>>>>> at org.h2.command.dml.Delete.update(Delete.java:50) >>>>>>> at >>>>>>> org.h2.command.CommandContainer.update(CommandContainer.java:70) >>>>>>> at org.h2.command.Command.executeUpdate(Command.java:199) >>>>>>> at >>>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141) >>>>>>> at >>>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127) >>>>>>> at >>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514) >>>>>>> ... 9 more >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi maninda, >>>>>>>> >>>>>>>> Ok, thanks for information. I'll do the test locally and get back >>>>>>>> to you. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sinthuja. >>>>>>>> >>>>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Sinthuja, >>>>>>>>> >>>>>>>>> I tested with smart-home sample in latest DAS with [1] config and >>>>>>>>> DAS with the attached config directory. (There data-bridge-config.xml >>>>>>>>> is as >>>>>>>>> [2]) >>>>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs. >>>>>>>>> This issue was always reproducible when 10M events are published >>>>>>>>> with the sample. For some time events get published and then it will >>>>>>>>> suddenly stop receiving events. But you can see the client is busy >>>>>>>>> with the >>>>>>>>> CPU usage while DAS is almost idling. >>>>>>>>> No debug or logging was enabled. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> <Agent> >>>>>>>>> <Name>Thrift</Name> >>>>>>>>> >>>>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass> >>>>>>>>> >>>>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore> >>>>>>>>> <TrustSorePassword>wso2carbon</TrustSorePassword> >>>>>>>>> <QueueSize>32768</QueueSize> >>>>>>>>> <BatchSize>200</BatchSize> >>>>>>>>> <CorePoolSize>5</CorePoolSize> >>>>>>>>> <MaxPoolSize>10</MaxPoolSize> >>>>>>>>> <KeepAliveTimeInPool>20</KeepAliveTimeInPool> >>>>>>>>> <ReconnectionInterval>30</ReconnectionInterval> >>>>>>>>> <MaxTransportPoolSize>250</MaxTransportPoolSize> >>>>>>>>> <MaxIdleConnections>250</MaxIdleConnections> >>>>>>>>> <EvictionTimePeriod>5500</EvictionTimePeriod> >>>>>>>>> <MinIdleTimeInPool>5000</MinIdleTimeInPool> >>>>>>>>> >>>>>>>>> <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize> >>>>>>>>> <SecureMaxIdleConnections>250</SecureMaxIdleConnections> >>>>>>>>> <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod> >>>>>>>>> <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool> >>>>>>>>> </Agent> >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> >>>>>>>>> <dataBridgeConfiguration> >>>>>>>>> >>>>>>>>> <workerThreads>10</workerThreads> >>>>>>>>> <eventBufferCapacity>1000</eventBufferCapacity> >>>>>>>>> <clientTimeoutMin>30</clientTimeoutMin> >>>>>>>>> >>>>>>>>> <dataReceiver name="Thrift"> >>>>>>>>> <config name="tcpPort">7611</config> >>>>>>>>> <config name="sslPort">7711</config> >>>>>>>>> </dataReceiver> >>>>>>>>> >>>>>>>>> <dataReceiver name="Binary"> >>>>>>>>> <config name="tcpPort">9611</config> >>>>>>>>> <config name="sslPort">9711</config> >>>>>>>>> <config name="sslReceiverThreadPoolSize">100</config> >>>>>>>>> <config name="tcpReceiverThreadPoolSize">100</config> >>>>>>>>> </dataReceiver> >>>>>>>>> >>>>>>>>> </dataBridgeConfiguration> >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Maninda Edirisooriya* >>>>>>>>> Senior Software Engineer >>>>>>>>> >>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>> >>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>> *E-mail* : [email protected] >>>>>>>>> *Skype* : @manindae >>>>>>>>> *Twitter* : @maninda >>>>>>>>> >>>>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Are you running with debug mode in logging? And can you >>>>>>>>>> constantly reproduce this? Or it's intermittent? >>>>>>>>>> >>>>>>>>>> Please provide the publisher and receiver side configs to test >>>>>>>>>> this and see. As I have already tested more than 10M records, I'm >>>>>>>>>> not sure >>>>>>>>>> what is the case here. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sinthuja. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it >>>>>>>>>>> runs for some time and stops receiving events suddenly. >>>>>>>>>>> But publisher in client was running in higher CPU usage when DAS >>>>>>>>>>> was running with very low CPU. >>>>>>>>>>> When another data agent was spawned it started to publish >>>>>>>>>>> correctly which was confirming that the issue is with the client >>>>>>>>>>> side. >>>>>>>>>>> We analyzed the thread dump and found the highest using thread >>>>>>>>>>> is with the following stack traces when we analyzed it twice. >>>>>>>>>>> >>>>>>>>>>> 1. >>>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>>>> [0x00007f85f250f000] >>>>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>>>> Source) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>>>> Source) >>>>>>>>>>> >>>>>>>>>>> 2. >>>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>>>> [0x00007f85f250f000] >>>>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>>>> at >>>>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202) >>>>>>>>>>> at org.apache.log4j.Category.forcedLog(Category.java:391) >>>>>>>>>>> at org.apache.log4j.Category.log(Category.java:856) >>>>>>>>>>> at >>>>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>>>> Source) >>>>>>>>>>> at >>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>>>> Source) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called >>>>>>>>>>> in >>>>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup* >>>>>>>>>>> class repeatedly because the disruptor ring buffer is filled in >>>>>>>>>>> client >>>>>>>>>>> side. Not sure why this happens. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Maninda Edirisooriya* >>>>>>>>>>> Senior Software Engineer >>>>>>>>>>> >>>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>>>> >>>>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>>>> *E-mail* : [email protected] >>>>>>>>>>> *Skype* : @manindae >>>>>>>>>>> *Twitter* : @maninda >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Sent from iPhone >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Sinthuja Rajendran* >>>>>>>> Associate Technical Lead >>>>>>>> WSO2, Inc.:http://wso2.com >>>>>>>> >>>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>>> Mobile: +94774273955 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Sinthuja Rajendran* >>>>>>> Associate Technical Lead >>>>>>> WSO2, Inc.:http://wso2.com >>>>>>> >>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>> Mobile: +94774273955 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Sinthuja Rajendran* >>>>> Associate Technical Lead >>>>> WSO2, Inc.:http://wso2.com >>>>> >>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>> Mobile: +94774273955 >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Sinthuja Rajendran* >>>> Associate Technical Lead >>>> WSO2, Inc.:http://wso2.com >>>> >>>> Blog: http://sinthu-rajan.blogspot.com/ >>>> Mobile: +94774273955 >>>> >>>> >>>> >>> >>> >>> -- >>> *Anjana Fernando* >>> Senior Technical Lead >>> WSO2 Inc. | http://wso2.com >>> lean . enterprise . middleware >>> >> >> >> >> -- >> *Sinthuja Rajendran* >> Associate Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> >> _______________________________________________ >> Dev mailing list >> [email protected] >> http://wso2.org/cgi-bin/mailman/listinfo/dev >> >> > > > -- > Gokul Balakrishnan > Senior Software Engineer, > WSO2, Inc. http://wso2.com > Mob: +94 77 593 5789 | +1 650 272 9927 > -- *Anjana Fernando* Senior Technical Lead WSO2 Inc. | http://wso2.com lean . enterprise . middleware
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
