Re: [Dev] New data publisher is hanging

Anjana Fernando Mon, 31 Aug 2015 22:25:46 -0700

Hi Sinthuja,

For this, disable the indexing stuff and try the tests. Here, we are just
testing just the publishing of events to the server, indexing will add more
load to the database, and can make it have timeouts etc.. For this, we can
use a different record store for index staging part and so on.


Cheers,
Anjana.

On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]>
wrote:

> Hi Maninda,
>
> I did a test with MySQL, and I was able to publish 10M events. There were
> some hickups as I mentioned before, and I could see the receiver queue is
> full, and the event sink worker threads are writing to database. Please
> refer the attached threaddump which was taken when the publisher is paused
> due to this. Please do run the test from your side, and share your
> observation.
>
> Thanks,
> Sinthuja.
>
> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]>
> wrote:
>
>> Hi Maninda,
>>
>> I'll also test with MySQL in local machine at the mean time, apparently I
>> observed really high CPU usage from DAS and publisher was normal, as other
>> way around of your observation. Please include the sout to the agent code
>> as discussed offline and share the results.
>>
>> Thanks,
>> Sinthuja.
>>
>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya <[email protected]>
>> wrote:
>>
>>> Hi Sinthuja,
>>>
>>> I have used MySQL in RDS. And I have used a indexing disabled version of
>>> smart home CApp to isolate issues. Here I have attached it. So I could not
>>> see any error in DAS side and that may be the low CPU usage in DAS than in
>>> publisher comparing to your setup as we discussed offline.
>>>
>>> Thanks.
>>>
>>>
>>> *Maninda Edirisooriya*
>>> Senior Software Engineer
>>>
>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>
>>> *Blog* : http://maninda.blogspot.com/
>>> *E-mail* : [email protected]
>>> *Skype* : @manindae
>>> *Twitter* : @maninda
>>>
>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected]>
>>> wrote:
>>>
>>>> Hi Maninda,
>>>>
>>>> I tested this locally now, and I was able to see some hickups when
>>>> publishing. So at the point when the publisher is kind of paused
>>>> publishing, I started a new publisher, and that also succeeded only upto
>>>> the publisher's event queue becomes full, and then that publisher also
>>>> stopped pushing. Can you confirm that same behaviour was observed in
>>>> publisher? I think this have made you to think the publisher has become
>>>> hang state, but actually the receiver queue was full and it stops accepting
>>>> the events further.
>>>>
>>>> And during that time, I was able to see multiple error logs in the DAS
>>>> side. Therefore I think the event persisting thread has become  very slow,
>>>> and hence the this behaviour was observed. I have attached the DAS
>>>> threaddump, and I could see many threads are in blocked state on H2
>>>> database. What is the database that you are using to test? I think better
>>>> you try with MySQL, some other production recommended databases.
>>>>
>>>> [1]
>>>>
>>>> [2015-08-31 19:17:04,359] ERROR
>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} -
>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does not
>>>> exist
>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException:
>>>> [-1000:__INDEX_DATA__] does not exist
>>>>     at
>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>     at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:745)
>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException:
>>>> Error in deleting records: Timeout trying to lock table "ANX___8GIVT7RC_";
>>>> SQL statement:
>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>     at
>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519)
>>>>     at
>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>     at
>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>     at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>     at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>     at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table
>>>> "ANX___8GIVT7RC_"; SQL statement:
>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>     at
>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327)
>>>>     at org.h2.message.DbException.get(DbException.java:167)
>>>>     at org.h2.message.DbException.get(DbException.java:144)
>>>>     at org.h2.table.RegularTable.doLock(RegularTable.java:466)
>>>>     at org.h2.table.RegularTable.lock(RegularTable.java:404)
>>>>     at org.h2.command.dml.Delete.update(Delete.java:50)
>>>>     at org.h2.command.CommandContainer.update(CommandContainer.java:70)
>>>>     at org.h2.command.Command.executeUpdate(Command.java:199)
>>>>     at
>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141)
>>>>     at
>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127)
>>>>     at
>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514)
>>>>     ... 9 more
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran <[email protected]
>>>> > wrote:
>>>>
>>>>> Hi maninda,
>>>>>
>>>>> Ok, thanks for information. I'll do the test locally and get back to
>>>>> you.
>>>>>
>>>>> Thanks,
>>>>> Sinthuja.
>>>>>
>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Sinthuja,
>>>>>>
>>>>>> I tested with smart-home sample in latest DAS with [1] config and DAS
>>>>>> with the attached config directory. (There data-bridge-config.xml is as 
>>>>>> [2])
>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs.
>>>>>> This issue was always reproducible when 10M events are published with
>>>>>> the sample. For some time events get published and then it will suddenly
>>>>>> stop receiving events. But you can see the client is busy with the CPU
>>>>>> usage while DAS is almost idling.
>>>>>> No debug or logging was enabled.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>     <Agent>
>>>>>>         <Name>Thrift</Name>
>>>>>>
>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass>
>>>>>>
>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore>
>>>>>>         <TrustSorePassword>wso2carbon</TrustSorePassword>
>>>>>>         <QueueSize>32768</QueueSize>
>>>>>>         <BatchSize>200</BatchSize>
>>>>>>         <CorePoolSize>5</CorePoolSize>
>>>>>>         <MaxPoolSize>10</MaxPoolSize>
>>>>>>         <KeepAliveTimeInPool>20</KeepAliveTimeInPool>
>>>>>>         <ReconnectionInterval>30</ReconnectionInterval>
>>>>>>         <MaxTransportPoolSize>250</MaxTransportPoolSize>
>>>>>>         <MaxIdleConnections>250</MaxIdleConnections>
>>>>>>         <EvictionTimePeriod>5500</EvictionTimePeriod>
>>>>>>         <MinIdleTimeInPool>5000</MinIdleTimeInPool>
>>>>>>         <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize>
>>>>>>         <SecureMaxIdleConnections>250</SecureMaxIdleConnections>
>>>>>>         <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod>
>>>>>>         <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool>
>>>>>>     </Agent>
>>>>>>
>>>>>> [2]
>>>>>>
>>>>>> <dataBridgeConfiguration>
>>>>>>
>>>>>>     <workerThreads>10</workerThreads>
>>>>>>     <eventBufferCapacity>1000</eventBufferCapacity>
>>>>>>     <clientTimeoutMin>30</clientTimeoutMin>
>>>>>>
>>>>>>     <dataReceiver name="Thrift">
>>>>>>         <config name="tcpPort">7611</config>
>>>>>>         <config name="sslPort">7711</config>
>>>>>>     </dataReceiver>
>>>>>>
>>>>>>     <dataReceiver name="Binary">
>>>>>>         <config name="tcpPort">9611</config>
>>>>>>         <config name="sslPort">9711</config>
>>>>>>         <config name="sslReceiverThreadPoolSize">100</config>
>>>>>>         <config name="tcpReceiverThreadPoolSize">100</config>
>>>>>>     </dataReceiver>
>>>>>>
>>>>>> </dataBridgeConfiguration>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> *Maninda Edirisooriya*
>>>>>> Senior Software Engineer
>>>>>>
>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>
>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>> *E-mail* : [email protected]
>>>>>> *Skype* : @manindae
>>>>>> *Twitter* : @maninda
>>>>>>
>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Are you running with debug mode in logging? And can you constantly
>>>>>>> reproduce this? Or it's intermittent?
>>>>>>>
>>>>>>> Please provide the publisher and receiver side configs to test this
>>>>>>> and see. As I have already tested more than 10M records, I'm not sure 
>>>>>>> what
>>>>>>> is the case here.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sinthuja.
>>>>>>>
>>>>>>>
>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it
>>>>>>>> runs for some time and stops receiving events suddenly.
>>>>>>>> But publisher in client was running in higher CPU usage when DAS
>>>>>>>> was running with very low CPU.
>>>>>>>> When another data agent was spawned it started to publish correctly
>>>>>>>> which was confirming that the issue is with the client side.
>>>>>>>> We analyzed the thread dump and found the highest using thread is
>>>>>>>> with the following stack traces when we analyzed it twice.
>>>>>>>>
>>>>>>>> 1.
>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>> [0x00007f85f250f000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148)
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>> Source)
>>>>>>>>     at
>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>> Source)
>>>>>>>>
>>>>>>>> 2.
>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>> [0x00007f85f250f000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>         at
>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202)
>>>>>>>>         at org.apache.log4j.Category.forcedLog(Category.java:391)
>>>>>>>>         at org.apache.log4j.Category.log(Category.java:856)
>>>>>>>>         at
>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>> Source)
>>>>>>>>         at
>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>> Source)
>>>>>>>>
>>>>>>>>
>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called in
>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup*
>>>>>>>> class repeatedly because the disruptor ring buffer is filled in client
>>>>>>>> side. Not sure why this happens.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Maninda Edirisooriya*
>>>>>>>> Senior Software Engineer
>>>>>>>>
>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>
>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>> *E-mail* : [email protected]
>>>>>>>> *Skype* : @manindae
>>>>>>>> *Twitter* : @maninda
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from iPhone
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Sinthuja Rajendran*
>>>>> Associate Technical Lead
>>>>> WSO2, Inc.:http://wso2.com
>>>>>
>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>> Mobile: +94774273955
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Sinthuja Rajendran*
>>>> Associate Technical Lead
>>>> WSO2, Inc.:http://wso2.com
>>>>
>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>> Mobile: +94774273955
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Associate Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>
>
> --
> *Sinthuja Rajendran*
> Associate Technical Lead
> WSO2, Inc.:http://wso2.com
>
> Blog: http://sinthu-rajan.blogspot.com/
> Mobile: +94774273955
>
>
>


-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] New data publisher is hanging

Reply via email to