Re: [Dev] New data publisher is hanging

Nirmal Fernando Mon, 31 Aug 2015 22:38:30 -0700

All,

Sorry for jumping in. But, shouldn't we expect this behaviour when we are
using a blocking executor?


On Tue, Sep 1, 2015 at 11:00 AM, Sinthuja Ragendran <[email protected]>
wrote:

> Hi Anjana,
>
>
> On Mon, Aug 31, 2015 at 10:23 PM, Anjana Fernando <[email protected]> wrote:
>
>> Hi Sinthuja,
>>
>> For this, disable the indexing stuff and try the tests. Here, we are just
>> testing just the publishing of events to the server, indexing will add more
>> load to the database, and can make it have timeouts etc.. For this, we can
>> use a different record store for index staging part and so on.
>>
>
> Yeah, the latest test I ran was without indexing and with the CAPP maninda
> used for testing. Further, as mentioned i was able to see some times the
> publisher stops, and then resumes. At the time of publisher is stopped i
> got the threaddump which was attached in the last mail. There you can see
> the receiver queue is full, and worker threads are busy in inserting the
> records to DAL. I hope this is the same case in maninda's setup as well,
> but have to get a threaddump of DAS to confirm this.
>
> Thanks,
> Sinthuja.
>
>
>> Cheers,
>> Anjana.
>>
>> On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]>
>> wrote:
>>
>>> Hi Maninda,
>>>
>>> I did a test with MySQL, and I was able to publish 10M events. There
>>> were some hickups as I mentioned before, and I could see the receiver queue
>>> is full, and the event sink worker threads are writing to database. Please
>>> refer the attached threaddump which was taken when the publisher is paused
>>> due to this. Please do run the test from your side, and share your
>>> observation.
>>>
>>> Thanks,
>>> Sinthuja.
>>>
>>> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]>
>>> wrote:
>>>
>>>> Hi Maninda,
>>>>
>>>> I'll also test with MySQL in local machine at the mean time, apparently
>>>> I observed really high CPU usage from DAS and publisher was normal, as
>>>> other way around of your observation. Please include the sout to the agent
>>>> code as discussed offline and share the results.
>>>>
>>>> Thanks,
>>>> Sinthuja.
>>>>
>>>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi Sinthuja,
>>>>>
>>>>> I have used MySQL in RDS. And I have used a indexing disabled version
>>>>> of smart home CApp to isolate issues. Here I have attached it. So I could
>>>>> not see any error in DAS side and that may be the low CPU usage in DAS 
>>>>> than
>>>>> in publisher comparing to your setup as we discussed offline.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> *Maninda Edirisooriya*
>>>>> Senior Software Engineer
>>>>>
>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>
>>>>> *Blog* : http://maninda.blogspot.com/
>>>>> *E-mail* : [email protected]
>>>>> *Skype* : @manindae
>>>>> *Twitter* : @maninda
>>>>>
>>>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Maninda,
>>>>>>
>>>>>> I tested this locally now, and I was able to see some hickups when
>>>>>> publishing. So at the point when the publisher is kind of paused
>>>>>> publishing, I started a new publisher, and that also succeeded only upto
>>>>>> the publisher's event queue becomes full, and then that publisher also
>>>>>> stopped pushing. Can you confirm that same behaviour was observed in
>>>>>> publisher? I think this have made you to think the publisher has become
>>>>>> hang state, but actually the receiver queue was full and it stops 
>>>>>> accepting
>>>>>> the events further.
>>>>>>
>>>>>> And during that time, I was able to see multiple error logs in the
>>>>>> DAS side. Therefore I think the event persisting thread has become  very
>>>>>> slow, and hence the this behaviour was observed. I have attached the DAS
>>>>>> threaddump, and I could see many threads are in blocked state on H2
>>>>>> database. What is the database that you are using to test? I think better
>>>>>> you try with MySQL, some other production recommended databases.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> [2015-08-31 19:17:04,359] ERROR
>>>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} -
>>>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does 
>>>>>> not
>>>>>> exist
>>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException:
>>>>>> [-1000:__INDEX_DATA__] does not exist
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>>>     at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>     at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException:
>>>>>> Error in deleting records: Timeout trying to lock table 
>>>>>> "ANX___8GIVT7RC_";
>>>>>> SQL statement:
>>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>>>     at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>     at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table
>>>>>> "ANX___8GIVT7RC_"; SQL statement:
>>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>>>     at
>>>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327)
>>>>>>     at org.h2.message.DbException.get(DbException.java:167)
>>>>>>     at org.h2.message.DbException.get(DbException.java:144)
>>>>>>     at org.h2.table.RegularTable.doLock(RegularTable.java:466)
>>>>>>     at org.h2.table.RegularTable.lock(RegularTable.java:404)
>>>>>>     at org.h2.command.dml.Delete.update(Delete.java:50)
>>>>>>     at
>>>>>> org.h2.command.CommandContainer.update(CommandContainer.java:70)
>>>>>>     at org.h2.command.Command.executeUpdate(Command.java:199)
>>>>>>     at
>>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141)
>>>>>>     at
>>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127)
>>>>>>     at
>>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514)
>>>>>>     ... 9 more
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi maninda,
>>>>>>>
>>>>>>> Ok, thanks for information. I'll do the test locally and get back to
>>>>>>> you.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sinthuja.
>>>>>>>
>>>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Sinthuja,
>>>>>>>>
>>>>>>>> I tested with smart-home sample in latest DAS with [1] config and
>>>>>>>> DAS with the attached config directory. (There data-bridge-config.xml 
>>>>>>>> is as
>>>>>>>> [2])
>>>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs.
>>>>>>>> This issue was always reproducible when 10M events are published
>>>>>>>> with the sample. For some time events get published and then it will
>>>>>>>> suddenly stop receiving events. But you can see the client is busy 
>>>>>>>> with the
>>>>>>>> CPU usage while DAS is almost idling.
>>>>>>>> No debug or logging was enabled.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>     <Agent>
>>>>>>>>         <Name>Thrift</Name>
>>>>>>>>
>>>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass>
>>>>>>>>
>>>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore>
>>>>>>>>         <TrustSorePassword>wso2carbon</TrustSorePassword>
>>>>>>>>         <QueueSize>32768</QueueSize>
>>>>>>>>         <BatchSize>200</BatchSize>
>>>>>>>>         <CorePoolSize>5</CorePoolSize>
>>>>>>>>         <MaxPoolSize>10</MaxPoolSize>
>>>>>>>>         <KeepAliveTimeInPool>20</KeepAliveTimeInPool>
>>>>>>>>         <ReconnectionInterval>30</ReconnectionInterval>
>>>>>>>>         <MaxTransportPoolSize>250</MaxTransportPoolSize>
>>>>>>>>         <MaxIdleConnections>250</MaxIdleConnections>
>>>>>>>>         <EvictionTimePeriod>5500</EvictionTimePeriod>
>>>>>>>>         <MinIdleTimeInPool>5000</MinIdleTimeInPool>
>>>>>>>>         <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize>
>>>>>>>>         <SecureMaxIdleConnections>250</SecureMaxIdleConnections>
>>>>>>>>         <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod>
>>>>>>>>         <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool>
>>>>>>>>     </Agent>
>>>>>>>>
>>>>>>>> [2]
>>>>>>>>
>>>>>>>> <dataBridgeConfiguration>
>>>>>>>>
>>>>>>>>     <workerThreads>10</workerThreads>
>>>>>>>>     <eventBufferCapacity>1000</eventBufferCapacity>
>>>>>>>>     <clientTimeoutMin>30</clientTimeoutMin>
>>>>>>>>
>>>>>>>>     <dataReceiver name="Thrift">
>>>>>>>>         <config name="tcpPort">7611</config>
>>>>>>>>         <config name="sslPort">7711</config>
>>>>>>>>     </dataReceiver>
>>>>>>>>
>>>>>>>>     <dataReceiver name="Binary">
>>>>>>>>         <config name="tcpPort">9611</config>
>>>>>>>>         <config name="sslPort">9711</config>
>>>>>>>>         <config name="sslReceiverThreadPoolSize">100</config>
>>>>>>>>         <config name="tcpReceiverThreadPoolSize">100</config>
>>>>>>>>     </dataReceiver>
>>>>>>>>
>>>>>>>> </dataBridgeConfiguration>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Maninda Edirisooriya*
>>>>>>>> Senior Software Engineer
>>>>>>>>
>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>
>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>> *E-mail* : [email protected]
>>>>>>>> *Skype* : @manindae
>>>>>>>> *Twitter* : @maninda
>>>>>>>>
>>>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Are you running with debug mode in logging? And can you constantly
>>>>>>>>> reproduce this? Or it's intermittent?
>>>>>>>>>
>>>>>>>>> Please provide the publisher and receiver side configs to test
>>>>>>>>> this and see. As I have already tested more than 10M records, I'm not 
>>>>>>>>> sure
>>>>>>>>> what is the case here.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sinthuja.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it
>>>>>>>>>> runs for some time and stops receiving events suddenly.
>>>>>>>>>> But publisher in client was running in higher CPU usage when DAS
>>>>>>>>>> was running with very low CPU.
>>>>>>>>>> When another data agent was spawned it started to publish
>>>>>>>>>> correctly which was confirming that the issue is with the client 
>>>>>>>>>> side.
>>>>>>>>>> We analyzed the thread dump and found the highest using thread is
>>>>>>>>>> with the following stack traces when we analyzed it twice.
>>>>>>>>>>
>>>>>>>>>> 1.
>>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>>>> [0x00007f85f250f000]
>>>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148)
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>     at
>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>
>>>>>>>>>> 2.
>>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>>>> [0x00007f85f250f000]
>>>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>>>         at
>>>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202)
>>>>>>>>>>         at org.apache.log4j.Category.forcedLog(Category.java:391)
>>>>>>>>>>         at org.apache.log4j.Category.log(Category.java:856)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>         at
>>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called
>>>>>>>>>> in
>>>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup*
>>>>>>>>>> class repeatedly because the disruptor ring buffer is filled in 
>>>>>>>>>> client
>>>>>>>>>> side. Not sure why this happens.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>
>>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>>>
>>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>>> *E-mail* : [email protected]
>>>>>>>>>> *Skype* : @manindae
>>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from iPhone
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Sinthuja Rajendran*
>>>>>>> Associate Technical Lead
>>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>>
>>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>>> Mobile: +94774273955
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Sinthuja Rajendran*
>>>>>> Associate Technical Lead
>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>
>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>> Mobile: +94774273955
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Sinthuja Rajendran*
>>>> Associate Technical Lead
>>>> WSO2, Inc.:http://wso2.com
>>>>
>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>> Mobile: +94774273955
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Sinthuja Rajendran*
>>> Associate Technical Lead
>>> WSO2, Inc.:http://wso2.com
>>>
>>> Blog: http://sinthu-rajan.blogspot.com/
>>> Mobile: +94774273955
>>>
>>>
>>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
> *Sinthuja Rajendran*
> Associate Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] New data publisher is hanging

Reply via email to