Hi Anjana,
On Mon, Aug 31, 2015 at 10:23 PM, Anjana Fernando <[email protected]> wrote: > Hi Sinthuja, > > For this, disable the indexing stuff and try the tests. Here, we are just > testing just the publishing of events to the server, indexing will add more > load to the database, and can make it have timeouts etc.. For this, we can > use a different record store for index staging part and so on. > Yeah, the latest test I ran was without indexing and with the CAPP maninda used for testing. Further, as mentioned i was able to see some times the publisher stops, and then resumes. At the time of publisher is stopped i got the threaddump which was attached in the last mail. There you can see the receiver queue is full, and worker threads are busy in inserting the records to DAL. I hope this is the same case in maninda's setup as well, but have to get a threaddump of DAS to confirm this. Thanks, Sinthuja. > Cheers, > Anjana. > > On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]> > wrote: > >> Hi Maninda, >> >> I did a test with MySQL, and I was able to publish 10M events. There were >> some hickups as I mentioned before, and I could see the receiver queue is >> full, and the event sink worker threads are writing to database. Please >> refer the attached threaddump which was taken when the publisher is paused >> due to this. Please do run the test from your side, and share your >> observation. >> >> Thanks, >> Sinthuja. >> >> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi Maninda, >>> >>> I'll also test with MySQL in local machine at the mean time, apparently >>> I observed really high CPU usage from DAS and publisher was normal, as >>> other way around of your observation. Please include the sout to the agent >>> code as discussed offline and share the results. >>> >>> Thanks, >>> Sinthuja. >>> >>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya <[email protected]> >>> wrote: >>> >>>> Hi Sinthuja, >>>> >>>> I have used MySQL in RDS. And I have used a indexing disabled version >>>> of smart home CApp to isolate issues. Here I have attached it. So I could >>>> not see any error in DAS side and that may be the low CPU usage in DAS than >>>> in publisher comparing to your setup as we discussed offline. >>>> >>>> Thanks. >>>> >>>> >>>> *Maninda Edirisooriya* >>>> Senior Software Engineer >>>> >>>> *WSO2, Inc.*lean.enterprise.middleware. >>>> >>>> *Blog* : http://maninda.blogspot.com/ >>>> *E-mail* : [email protected] >>>> *Skype* : @manindae >>>> *Twitter* : @maninda >>>> >>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected]> >>>> wrote: >>>> >>>>> Hi Maninda, >>>>> >>>>> I tested this locally now, and I was able to see some hickups when >>>>> publishing. So at the point when the publisher is kind of paused >>>>> publishing, I started a new publisher, and that also succeeded only upto >>>>> the publisher's event queue becomes full, and then that publisher also >>>>> stopped pushing. Can you confirm that same behaviour was observed in >>>>> publisher? I think this have made you to think the publisher has become >>>>> hang state, but actually the receiver queue was full and it stops >>>>> accepting >>>>> the events further. >>>>> >>>>> And during that time, I was able to see multiple error logs in the DAS >>>>> side. Therefore I think the event persisting thread has become very slow, >>>>> and hence the this behaviour was observed. I have attached the DAS >>>>> threaddump, and I could see many threads are in blocked state on H2 >>>>> database. What is the database that you are using to test? I think better >>>>> you try with MySQL, some other production recommended databases. >>>>> >>>>> [1] >>>>> >>>>> [2015-08-31 19:17:04,359] ERROR >>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} - >>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does >>>>> not >>>>> exist >>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException: >>>>> [-1000:__INDEX_DATA__] does not exist >>>>> at >>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException: >>>>> Error in deleting records: Timeout trying to lock table "ANX___8GIVT7RC_"; >>>>> SQL statement: >>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>> at >>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519) >>>>> at >>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>> at >>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table >>>>> "ANX___8GIVT7RC_"; SQL statement: >>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>> at >>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327) >>>>> at org.h2.message.DbException.get(DbException.java:167) >>>>> at org.h2.message.DbException.get(DbException.java:144) >>>>> at org.h2.table.RegularTable.doLock(RegularTable.java:466) >>>>> at org.h2.table.RegularTable.lock(RegularTable.java:404) >>>>> at org.h2.command.dml.Delete.update(Delete.java:50) >>>>> at org.h2.command.CommandContainer.update(CommandContainer.java:70) >>>>> at org.h2.command.Command.executeUpdate(Command.java:199) >>>>> at >>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141) >>>>> at >>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127) >>>>> at >>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514) >>>>> ... 9 more >>>>> >>>>> >>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi maninda, >>>>>> >>>>>> Ok, thanks for information. I'll do the test locally and get back to >>>>>> you. >>>>>> >>>>>> Thanks, >>>>>> Sinthuja. >>>>>> >>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Sinthuja, >>>>>>> >>>>>>> I tested with smart-home sample in latest DAS with [1] config and >>>>>>> DAS with the attached config directory. (There data-bridge-config.xml >>>>>>> is as >>>>>>> [2]) >>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs. >>>>>>> This issue was always reproducible when 10M events are published >>>>>>> with the sample. For some time events get published and then it will >>>>>>> suddenly stop receiving events. But you can see the client is busy with >>>>>>> the >>>>>>> CPU usage while DAS is almost idling. >>>>>>> No debug or logging was enabled. >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> <Agent> >>>>>>> <Name>Thrift</Name> >>>>>>> >>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass> >>>>>>> >>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore> >>>>>>> <TrustSorePassword>wso2carbon</TrustSorePassword> >>>>>>> <QueueSize>32768</QueueSize> >>>>>>> <BatchSize>200</BatchSize> >>>>>>> <CorePoolSize>5</CorePoolSize> >>>>>>> <MaxPoolSize>10</MaxPoolSize> >>>>>>> <KeepAliveTimeInPool>20</KeepAliveTimeInPool> >>>>>>> <ReconnectionInterval>30</ReconnectionInterval> >>>>>>> <MaxTransportPoolSize>250</MaxTransportPoolSize> >>>>>>> <MaxIdleConnections>250</MaxIdleConnections> >>>>>>> <EvictionTimePeriod>5500</EvictionTimePeriod> >>>>>>> <MinIdleTimeInPool>5000</MinIdleTimeInPool> >>>>>>> <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize> >>>>>>> <SecureMaxIdleConnections>250</SecureMaxIdleConnections> >>>>>>> <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod> >>>>>>> <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool> >>>>>>> </Agent> >>>>>>> >>>>>>> [2] >>>>>>> >>>>>>> <dataBridgeConfiguration> >>>>>>> >>>>>>> <workerThreads>10</workerThreads> >>>>>>> <eventBufferCapacity>1000</eventBufferCapacity> >>>>>>> <clientTimeoutMin>30</clientTimeoutMin> >>>>>>> >>>>>>> <dataReceiver name="Thrift"> >>>>>>> <config name="tcpPort">7611</config> >>>>>>> <config name="sslPort">7711</config> >>>>>>> </dataReceiver> >>>>>>> >>>>>>> <dataReceiver name="Binary"> >>>>>>> <config name="tcpPort">9611</config> >>>>>>> <config name="sslPort">9711</config> >>>>>>> <config name="sslReceiverThreadPoolSize">100</config> >>>>>>> <config name="tcpReceiverThreadPoolSize">100</config> >>>>>>> </dataReceiver> >>>>>>> >>>>>>> </dataBridgeConfiguration> >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> *Maninda Edirisooriya* >>>>>>> Senior Software Engineer >>>>>>> >>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>> >>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>> *E-mail* : [email protected] >>>>>>> *Skype* : @manindae >>>>>>> *Twitter* : @maninda >>>>>>> >>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Are you running with debug mode in logging? And can you constantly >>>>>>>> reproduce this? Or it's intermittent? >>>>>>>> >>>>>>>> Please provide the publisher and receiver side configs to test this >>>>>>>> and see. As I have already tested more than 10M records, I'm not sure >>>>>>>> what >>>>>>>> is the case here. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sinthuja. >>>>>>>> >>>>>>>> >>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it >>>>>>>>> runs for some time and stops receiving events suddenly. >>>>>>>>> But publisher in client was running in higher CPU usage when DAS >>>>>>>>> was running with very low CPU. >>>>>>>>> When another data agent was spawned it started to publish >>>>>>>>> correctly which was confirming that the issue is with the client side. >>>>>>>>> We analyzed the thread dump and found the highest using thread is >>>>>>>>> with the following stack traces when we analyzed it twice. >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>> [0x00007f85f250f000] >>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>> Source) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>> Source) >>>>>>>>> >>>>>>>>> 2. >>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>> [0x00007f85f250f000] >>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>> at >>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202) >>>>>>>>> at org.apache.log4j.Category.forcedLog(Category.java:391) >>>>>>>>> at org.apache.log4j.Category.log(Category.java:856) >>>>>>>>> at >>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>> Source) >>>>>>>>> at >>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>> Source) >>>>>>>>> >>>>>>>>> >>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called >>>>>>>>> in >>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup* >>>>>>>>> class repeatedly because the disruptor ring buffer is filled in client >>>>>>>>> side. Not sure why this happens. >>>>>>>>> >>>>>>>>> >>>>>>>>> *Maninda Edirisooriya* >>>>>>>>> Senior Software Engineer >>>>>>>>> >>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>> >>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>> *E-mail* : [email protected] >>>>>>>>> *Skype* : @manindae >>>>>>>>> *Twitter* : @maninda >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sent from iPhone >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Sinthuja Rajendran* >>>>>> Associate Technical Lead >>>>>> WSO2, Inc.:http://wso2.com >>>>>> >>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>> Mobile: +94774273955 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Sinthuja Rajendran* >>>>> Associate Technical Lead >>>>> WSO2, Inc.:http://wso2.com >>>>> >>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>> Mobile: +94774273955 >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >> >> >> -- >> *Sinthuja Rajendran* >> Associate Technical Lead >> WSO2, Inc.:http://wso2.com >> >> Blog: http://sinthu-rajan.blogspot.com/ >> Mobile: +94774273955 >> >> >> > > > -- > *Anjana Fernando* > Senior Technical Lead > WSO2 Inc. | http://wso2.com > lean . enterprise . middleware > -- *Sinthuja Rajendran* Associate Technical Lead WSO2, Inc.:http://wso2.com Blog: http://sinthu-rajan.blogspot.com/ Mobile: +94774273955
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
