All, Sorry for jumping in. But, shouldn't we expect this behaviour when we are using a blocking executor?
On Tue, Sep 1, 2015 at 11:00 AM, Sinthuja Ragendran <[email protected]> wrote: > Hi Anjana, > > > On Mon, Aug 31, 2015 at 10:23 PM, Anjana Fernando <[email protected]> wrote: > >> Hi Sinthuja, >> >> For this, disable the indexing stuff and try the tests. Here, we are just >> testing just the publishing of events to the server, indexing will add more >> load to the database, and can make it have timeouts etc.. For this, we can >> use a different record store for index staging part and so on. >> > > Yeah, the latest test I ran was without indexing and with the CAPP maninda > used for testing. Further, as mentioned i was able to see some times the > publisher stops, and then resumes. At the time of publisher is stopped i > got the threaddump which was attached in the last mail. There you can see > the receiver queue is full, and worker threads are busy in inserting the > records to DAL. I hope this is the same case in maninda's setup as well, > but have to get a threaddump of DAS to confirm this. > > Thanks, > Sinthuja. > > >> Cheers, >> Anjana. >> >> On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]> >> wrote: >> >>> Hi Maninda, >>> >>> I did a test with MySQL, and I was able to publish 10M events. There >>> were some hickups as I mentioned before, and I could see the receiver queue >>> is full, and the event sink worker threads are writing to database. Please >>> refer the attached threaddump which was taken when the publisher is paused >>> due to this. Please do run the test from your side, and share your >>> observation. >>> >>> Thanks, >>> Sinthuja. >>> >>> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]> >>> wrote: >>> >>>> Hi Maninda, >>>> >>>> I'll also test with MySQL in local machine at the mean time, apparently >>>> I observed really high CPU usage from DAS and publisher was normal, as >>>> other way around of your observation. Please include the sout to the agent >>>> code as discussed offline and share the results. >>>> >>>> Thanks, >>>> Sinthuja. >>>> >>>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya <[email protected] >>>> > wrote: >>>> >>>>> Hi Sinthuja, >>>>> >>>>> I have used MySQL in RDS. And I have used a indexing disabled version >>>>> of smart home CApp to isolate issues. Here I have attached it. So I could >>>>> not see any error in DAS side and that may be the low CPU usage in DAS >>>>> than >>>>> in publisher comparing to your setup as we discussed offline. >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> *Maninda Edirisooriya* >>>>> Senior Software Engineer >>>>> >>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>> >>>>> *Blog* : http://maninda.blogspot.com/ >>>>> *E-mail* : [email protected] >>>>> *Skype* : @manindae >>>>> *Twitter* : @maninda >>>>> >>>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Maninda, >>>>>> >>>>>> I tested this locally now, and I was able to see some hickups when >>>>>> publishing. So at the point when the publisher is kind of paused >>>>>> publishing, I started a new publisher, and that also succeeded only upto >>>>>> the publisher's event queue becomes full, and then that publisher also >>>>>> stopped pushing. Can you confirm that same behaviour was observed in >>>>>> publisher? I think this have made you to think the publisher has become >>>>>> hang state, but actually the receiver queue was full and it stops >>>>>> accepting >>>>>> the events further. >>>>>> >>>>>> And during that time, I was able to see multiple error logs in the >>>>>> DAS side. Therefore I think the event persisting thread has become very >>>>>> slow, and hence the this behaviour was observed. I have attached the DAS >>>>>> threaddump, and I could see many threads are in blocked state on H2 >>>>>> database. What is the database that you are using to test? I think better >>>>>> you try with MySQL, some other production recommended databases. >>>>>> >>>>>> [1] >>>>>> >>>>>> [2015-08-31 19:17:04,359] ERROR >>>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} - >>>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does >>>>>> not >>>>>> exist >>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException: >>>>>> [-1000:__INDEX_DATA__] does not exist >>>>>> at >>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException: >>>>>> Error in deleting records: Timeout trying to lock table >>>>>> "ANX___8GIVT7RC_"; >>>>>> SQL statement: >>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>>> at >>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519) >>>>>> at >>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130) >>>>>> at >>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table >>>>>> "ANX___8GIVT7RC_"; SQL statement: >>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN >>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140] >>>>>> at >>>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327) >>>>>> at org.h2.message.DbException.get(DbException.java:167) >>>>>> at org.h2.message.DbException.get(DbException.java:144) >>>>>> at org.h2.table.RegularTable.doLock(RegularTable.java:466) >>>>>> at org.h2.table.RegularTable.lock(RegularTable.java:404) >>>>>> at org.h2.command.dml.Delete.update(Delete.java:50) >>>>>> at >>>>>> org.h2.command.CommandContainer.update(CommandContainer.java:70) >>>>>> at org.h2.command.Command.executeUpdate(Command.java:199) >>>>>> at >>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141) >>>>>> at >>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127) >>>>>> at >>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514) >>>>>> ... 9 more >>>>>> >>>>>> >>>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi maninda, >>>>>>> >>>>>>> Ok, thanks for information. I'll do the test locally and get back to >>>>>>> you. >>>>>>> >>>>>>> Thanks, >>>>>>> Sinthuja. >>>>>>> >>>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Sinthuja, >>>>>>>> >>>>>>>> I tested with smart-home sample in latest DAS with [1] config and >>>>>>>> DAS with the attached config directory. (There data-bridge-config.xml >>>>>>>> is as >>>>>>>> [2]) >>>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs. >>>>>>>> This issue was always reproducible when 10M events are published >>>>>>>> with the sample. For some time events get published and then it will >>>>>>>> suddenly stop receiving events. But you can see the client is busy >>>>>>>> with the >>>>>>>> CPU usage while DAS is almost idling. >>>>>>>> No debug or logging was enabled. >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> <Agent> >>>>>>>> <Name>Thrift</Name> >>>>>>>> >>>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass> >>>>>>>> >>>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore> >>>>>>>> <TrustSorePassword>wso2carbon</TrustSorePassword> >>>>>>>> <QueueSize>32768</QueueSize> >>>>>>>> <BatchSize>200</BatchSize> >>>>>>>> <CorePoolSize>5</CorePoolSize> >>>>>>>> <MaxPoolSize>10</MaxPoolSize> >>>>>>>> <KeepAliveTimeInPool>20</KeepAliveTimeInPool> >>>>>>>> <ReconnectionInterval>30</ReconnectionInterval> >>>>>>>> <MaxTransportPoolSize>250</MaxTransportPoolSize> >>>>>>>> <MaxIdleConnections>250</MaxIdleConnections> >>>>>>>> <EvictionTimePeriod>5500</EvictionTimePeriod> >>>>>>>> <MinIdleTimeInPool>5000</MinIdleTimeInPool> >>>>>>>> <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize> >>>>>>>> <SecureMaxIdleConnections>250</SecureMaxIdleConnections> >>>>>>>> <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod> >>>>>>>> <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool> >>>>>>>> </Agent> >>>>>>>> >>>>>>>> [2] >>>>>>>> >>>>>>>> <dataBridgeConfiguration> >>>>>>>> >>>>>>>> <workerThreads>10</workerThreads> >>>>>>>> <eventBufferCapacity>1000</eventBufferCapacity> >>>>>>>> <clientTimeoutMin>30</clientTimeoutMin> >>>>>>>> >>>>>>>> <dataReceiver name="Thrift"> >>>>>>>> <config name="tcpPort">7611</config> >>>>>>>> <config name="sslPort">7711</config> >>>>>>>> </dataReceiver> >>>>>>>> >>>>>>>> <dataReceiver name="Binary"> >>>>>>>> <config name="tcpPort">9611</config> >>>>>>>> <config name="sslPort">9711</config> >>>>>>>> <config name="sslReceiverThreadPoolSize">100</config> >>>>>>>> <config name="tcpReceiverThreadPoolSize">100</config> >>>>>>>> </dataReceiver> >>>>>>>> >>>>>>>> </dataBridgeConfiguration> >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> >>>>>>>> *Maninda Edirisooriya* >>>>>>>> Senior Software Engineer >>>>>>>> >>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>> >>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>> *E-mail* : [email protected] >>>>>>>> *Skype* : @manindae >>>>>>>> *Twitter* : @maninda >>>>>>>> >>>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Are you running with debug mode in logging? And can you constantly >>>>>>>>> reproduce this? Or it's intermittent? >>>>>>>>> >>>>>>>>> Please provide the publisher and receiver side configs to test >>>>>>>>> this and see. As I have already tested more than 10M records, I'm not >>>>>>>>> sure >>>>>>>>> what is the case here. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sinthuja. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it >>>>>>>>>> runs for some time and stops receiving events suddenly. >>>>>>>>>> But publisher in client was running in higher CPU usage when DAS >>>>>>>>>> was running with very low CPU. >>>>>>>>>> When another data agent was spawned it started to publish >>>>>>>>>> correctly which was confirming that the issue is with the client >>>>>>>>>> side. >>>>>>>>>> We analyzed the thread dump and found the highest using thread is >>>>>>>>>> with the following stack traces when we analyzed it twice. >>>>>>>>>> >>>>>>>>>> 1. >>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>>> [0x00007f85f250f000] >>>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>>> Source) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>>> Source) >>>>>>>>>> >>>>>>>>>> 2. >>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable >>>>>>>>>> [0x00007f85f250f000] >>>>>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>>>>> at >>>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202) >>>>>>>>>> at org.apache.log4j.Category.forcedLog(Category.java:391) >>>>>>>>>> at org.apache.log4j.Category.log(Category.java:856) >>>>>>>>>> at >>>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown >>>>>>>>>> Source) >>>>>>>>>> at >>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown >>>>>>>>>> Source) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called >>>>>>>>>> in >>>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup* >>>>>>>>>> class repeatedly because the disruptor ring buffer is filled in >>>>>>>>>> client >>>>>>>>>> side. Not sure why this happens. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Maninda Edirisooriya* >>>>>>>>>> Senior Software Engineer >>>>>>>>>> >>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware. >>>>>>>>>> >>>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>>> *E-mail* : [email protected] >>>>>>>>>> *Skype* : @manindae >>>>>>>>>> *Twitter* : @maninda >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sent from iPhone >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Sinthuja Rajendran* >>>>>>> Associate Technical Lead >>>>>>> WSO2, Inc.:http://wso2.com >>>>>>> >>>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>>> Mobile: +94774273955 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Sinthuja Rajendran* >>>>>> Associate Technical Lead >>>>>> WSO2, Inc.:http://wso2.com >>>>>> >>>>>> Blog: http://sinthu-rajan.blogspot.com/ >>>>>> Mobile: +94774273955 >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Sinthuja Rajendran* >>>> Associate Technical Lead >>>> WSO2, Inc.:http://wso2.com >>>> >>>> Blog: http://sinthu-rajan.blogspot.com/ >>>> Mobile: +94774273955 >>>> >>>> >>>> >>> >>> >>> -- >>> *Sinthuja Rajendran* >>> Associate Technical Lead >>> WSO2, Inc.:http://wso2.com >>> >>> Blog: http://sinthu-rajan.blogspot.com/ >>> Mobile: +94774273955 >>> >>> >>> >> >> >> -- >> *Anjana Fernando* >> Senior Technical Lead >> WSO2 Inc. | http://wso2.com >> lean . enterprise . middleware >> > > > > -- > *Sinthuja Rajendran* > Associate Technical Lead > WSO2, Inc.:http://wso2.com > > Blog: http://sinthu-rajan.blogspot.com/ > Mobile: +94774273955 > > > > _______________________________________________ > Dev mailing list > [email protected] > http://wso2.org/cgi-bin/mailman/listinfo/dev > > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
