Re: [Dev] New data publisher is hanging

Sinthuja Ragendran Mon, 31 Aug 2015 22:31:49 -0700

Hi Anjana,


On Mon, Aug 31, 2015 at 10:23 PM, Anjana Fernando <[email protected]> wrote:

> Hi Sinthuja,
>
> For this, disable the indexing stuff and try the tests. Here, we are just
> testing just the publishing of events to the server, indexing will add more
> load to the database, and can make it have timeouts etc.. For this, we can
> use a different record store for index staging part and so on.
>

Yeah, the latest test I ran was without indexing and with the CAPP maninda
used for testing. Further, as mentioned i was able to see some times the
publisher stops, and then resumes. At the time of publisher is stopped i
got the threaddump which was attached in the last mail. There you can see
the receiver queue is full, and worker threads are busy in inserting the
records to DAL. I hope this is the same case in maninda's setup as well,
but have to get a threaddump of DAS to confirm this.

Thanks,
Sinthuja.


> Cheers,
> Anjana.
>
> On Tue, Sep 1, 2015 at 10:47 AM, Sinthuja Ragendran <[email protected]>
> wrote:
>
>> Hi Maninda,
>>
>> I did a test with MySQL, and I was able to publish 10M events. There were
>> some hickups as I mentioned before, and I could see the receiver queue is
>> full, and the event sink worker threads are writing to database. Please
>> refer the attached threaddump which was taken when the publisher is paused
>> due to this. Please do run the test from your side, and share your
>> observation.
>>
>> Thanks,
>> Sinthuja.
>>
>> On Mon, Aug 31, 2015 at 8:50 PM, Sinthuja Ragendran <[email protected]>
>> wrote:
>>
>>> Hi Maninda,
>>>
>>> I'll also test with MySQL in local machine at the mean time, apparently
>>> I observed really high CPU usage from DAS and publisher was normal, as
>>> other way around of your observation. Please include the sout to the agent
>>> code as discussed offline and share the results.
>>>
>>> Thanks,
>>> Sinthuja.
>>>
>>> On Mon, Aug 31, 2015 at 8:46 PM, Maninda Edirisooriya <[email protected]>
>>> wrote:
>>>
>>>> Hi Sinthuja,
>>>>
>>>> I have used MySQL in RDS. And I have used a indexing disabled version
>>>> of smart home CApp to isolate issues. Here I have attached it. So I could
>>>> not see any error in DAS side and that may be the low CPU usage in DAS than
>>>> in publisher comparing to your setup as we discussed offline.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> *Maninda Edirisooriya*
>>>> Senior Software Engineer
>>>>
>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>
>>>> *Blog* : http://maninda.blogspot.com/
>>>> *E-mail* : [email protected]
>>>> *Skype* : @manindae
>>>> *Twitter* : @maninda
>>>>
>>>> On Tue, Sep 1, 2015 at 8:06 AM, Sinthuja Ragendran <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Maninda,
>>>>>
>>>>> I tested this locally now, and I was able to see some hickups when
>>>>> publishing. So at the point when the publisher is kind of paused
>>>>> publishing, I started a new publisher, and that also succeeded only upto
>>>>> the publisher's event queue becomes full, and then that publisher also
>>>>> stopped pushing. Can you confirm that same behaviour was observed in
>>>>> publisher? I think this have made you to think the publisher has become
>>>>> hang state, but actually the receiver queue was full and it stops 
>>>>> accepting
>>>>> the events further.
>>>>>
>>>>> And during that time, I was able to see multiple error logs in the DAS
>>>>> side. Therefore I think the event persisting thread has become  very slow,
>>>>> and hence the this behaviour was observed. I have attached the DAS
>>>>> threaddump, and I could see many threads are in blocked state on H2
>>>>> database. What is the database that you are using to test? I think better
>>>>> you try with MySQL, some other production recommended databases.
>>>>>
>>>>> [1]
>>>>>
>>>>> [2015-08-31 19:17:04,359] ERROR
>>>>> {org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer} -
>>>>> Error in processing index batch operations: [-1000:__INDEX_DATA__] does 
>>>>> not
>>>>> exist
>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsTableNotAvailableException:
>>>>> [-1000:__INDEX_DATA__] does not exist
>>>>>     at
>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.get(RDBMSAnalyticsRecordStore.java:319)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.loadIndexOperationRecords(AnalyticsDataIndexer.java:588)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:391)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>> org.wso2.carbon.analytics.datasource.commons.exception.AnalyticsException:
>>>>> Error in deleting records: Timeout trying to lock table "ANX___8GIVT7RC_";
>>>>> SQL statement:
>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>>     at
>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:519)
>>>>>     at
>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:491)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.deleteIndexRecords(AnalyticsDataIndexer.java:581)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:414)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.processIndexOperations(AnalyticsDataIndexer.java:381)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer.access$100(AnalyticsDataIndexer.java:130)
>>>>>     at
>>>>> org.wso2.carbon.analytics.dataservice.indexing.AnalyticsDataIndexer$IndexWorker.run(AnalyticsDataIndexer.java:1791)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.h2.jdbc.JdbcSQLException: Timeout trying to lock table
>>>>> "ANX___8GIVT7RC_"; SQL statement:
>>>>> DELETE FROM ANX___8GIvT7Rc_ WHERE record_id IN
>>>>> (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) [50200-140]
>>>>>     at
>>>>> org.h2.message.DbException.getJdbcSQLException(DbException.java:327)
>>>>>     at org.h2.message.DbException.get(DbException.java:167)
>>>>>     at org.h2.message.DbException.get(DbException.java:144)
>>>>>     at org.h2.table.RegularTable.doLock(RegularTable.java:466)
>>>>>     at org.h2.table.RegularTable.lock(RegularTable.java:404)
>>>>>     at org.h2.command.dml.Delete.update(Delete.java:50)
>>>>>     at org.h2.command.CommandContainer.update(CommandContainer.java:70)
>>>>>     at org.h2.command.Command.executeUpdate(Command.java:199)
>>>>>     at
>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdateInternal(JdbcPreparedStatement.java:141)
>>>>>     at
>>>>> org.h2.jdbc.JdbcPreparedStatement.executeUpdate(JdbcPreparedStatement.java:127)
>>>>>     at
>>>>> org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore.delete(RDBMSAnalyticsRecordStore.java:514)
>>>>>     ... 9 more
>>>>>
>>>>>
>>>>> On Mon, Aug 31, 2015 at 10:01 AM, Sinthuja Ragendran <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi maninda,
>>>>>>
>>>>>> Ok, thanks for information. I'll do the test locally and get back to
>>>>>> you.
>>>>>>
>>>>>> Thanks,
>>>>>> Sinthuja.
>>>>>>
>>>>>> On Mon, Aug 31, 2015 at 9:53 AM, Maninda Edirisooriya <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Sinthuja,
>>>>>>>
>>>>>>> I tested with smart-home sample in latest DAS with [1] config and
>>>>>>> DAS with the attached config directory. (There data-bridge-config.xml 
>>>>>>> is as
>>>>>>> [2])
>>>>>>> I did the test on EC2 instances with MySQL RDS instance as DBs.
>>>>>>> This issue was always reproducible when 10M events are published
>>>>>>> with the sample. For some time events get published and then it will
>>>>>>> suddenly stop receiving events. But you can see the client is busy with 
>>>>>>> the
>>>>>>> CPU usage while DAS is almost idling.
>>>>>>> No debug or logging was enabled.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>     <Agent>
>>>>>>>         <Name>Thrift</Name>
>>>>>>>
>>>>>>> <DataEndpointClass>org.wso2.carbon.databridge.agent.endpoint.thrift.ThriftDataEndpoint</DataEndpointClass>
>>>>>>>
>>>>>>> <TrustSore>src/main/resources/client-truststore.jks</TrustSore>
>>>>>>>         <TrustSorePassword>wso2carbon</TrustSorePassword>
>>>>>>>         <QueueSize>32768</QueueSize>
>>>>>>>         <BatchSize>200</BatchSize>
>>>>>>>         <CorePoolSize>5</CorePoolSize>
>>>>>>>         <MaxPoolSize>10</MaxPoolSize>
>>>>>>>         <KeepAliveTimeInPool>20</KeepAliveTimeInPool>
>>>>>>>         <ReconnectionInterval>30</ReconnectionInterval>
>>>>>>>         <MaxTransportPoolSize>250</MaxTransportPoolSize>
>>>>>>>         <MaxIdleConnections>250</MaxIdleConnections>
>>>>>>>         <EvictionTimePeriod>5500</EvictionTimePeriod>
>>>>>>>         <MinIdleTimeInPool>5000</MinIdleTimeInPool>
>>>>>>>         <SecureMaxTransportPoolSize>250</SecureMaxTransportPoolSize>
>>>>>>>         <SecureMaxIdleConnections>250</SecureMaxIdleConnections>
>>>>>>>         <SecureEvictionTimePeriod>5500</SecureEvictionTimePeriod>
>>>>>>>         <SecureMinIdleTimeInPool>5000</SecureMinIdleTimeInPool>
>>>>>>>     </Agent>
>>>>>>>
>>>>>>> [2]
>>>>>>>
>>>>>>> <dataBridgeConfiguration>
>>>>>>>
>>>>>>>     <workerThreads>10</workerThreads>
>>>>>>>     <eventBufferCapacity>1000</eventBufferCapacity>
>>>>>>>     <clientTimeoutMin>30</clientTimeoutMin>
>>>>>>>
>>>>>>>     <dataReceiver name="Thrift">
>>>>>>>         <config name="tcpPort">7611</config>
>>>>>>>         <config name="sslPort">7711</config>
>>>>>>>     </dataReceiver>
>>>>>>>
>>>>>>>     <dataReceiver name="Binary">
>>>>>>>         <config name="tcpPort">9611</config>
>>>>>>>         <config name="sslPort">9711</config>
>>>>>>>         <config name="sslReceiverThreadPoolSize">100</config>
>>>>>>>         <config name="tcpReceiverThreadPoolSize">100</config>
>>>>>>>     </dataReceiver>
>>>>>>>
>>>>>>> </dataBridgeConfiguration>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> *Maninda Edirisooriya*
>>>>>>> Senior Software Engineer
>>>>>>>
>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>
>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>> *E-mail* : [email protected]
>>>>>>> *Skype* : @manindae
>>>>>>> *Twitter* : @maninda
>>>>>>>
>>>>>>> On Mon, Aug 31, 2015 at 8:08 PM, Sinthuja Ragendran <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Are you running with debug mode in logging? And can you constantly
>>>>>>>> reproduce this? Or it's intermittent?
>>>>>>>>
>>>>>>>> Please provide the publisher and receiver side configs to test this
>>>>>>>> and see. As I have already tested more than 10M records, I'm not sure 
>>>>>>>> what
>>>>>>>> is the case here.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sinthuja.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, August 31, 2015, Maninda Edirisooriya <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> When I started a 10M load test from Smart Home sample in DAS it
>>>>>>>>> runs for some time and stops receiving events suddenly.
>>>>>>>>> But publisher in client was running in higher CPU usage when DAS
>>>>>>>>> was running with very low CPU.
>>>>>>>>> When another data agent was spawned it started to publish
>>>>>>>>> correctly which was confirming that the issue is with the client side.
>>>>>>>>> We analyzed the thread dump and found the highest using thread is
>>>>>>>>> with the following stack traces when we analyzed it twice.
>>>>>>>>>
>>>>>>>>> 1.
>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>>> [0x00007f85f250f000]
>>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:148)
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at
>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>>> Source)
>>>>>>>>>
>>>>>>>>> 2.
>>>>>>>>> "main" prio=10 tid=0x00007f85ec00a800 nid=0x7843 runnable
>>>>>>>>> [0x00007f85f250f000]
>>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>>>         at
>>>>>>>>> org.apache.log4j.Category.callAppenders(Category.java:202)
>>>>>>>>>         at org.apache.log4j.Category.forcedLog(Category.java:391)
>>>>>>>>>         at org.apache.log4j.Category.log(Category.java:856)
>>>>>>>>>         at
>>>>>>>>> org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:177)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.isActiveDataEndpointExists(DataEndpointGroup.java:264)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.access$400(DataEndpointGroup.java:46)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.put(DataEndpointGroup.java:155)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup$EventQueue.access$300(DataEndpointGroup.java:97)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.endpoint.DataEndpointGroup.publish(DataEndpointGroup.java:94)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.databridge.agent.DataPublisher.publish(DataPublisher.java:183)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.publishLogEvents(Unknown
>>>>>>>>> Source)
>>>>>>>>>         at
>>>>>>>>> org.wso2.carbon.das.smarthome.sample.SmartHomeAgent.main(Unknown 
>>>>>>>>> Source)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We suspect that *isActiveDataEndpointExists()* method is called
>>>>>>>>> in
>>>>>>>>> *org.wso2.carbon.analytics.eventsink.internal.queue.DataEndpointGroup*
>>>>>>>>> class repeatedly because the disruptor ring buffer is filled in client
>>>>>>>>> side. Not sure why this happens.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>> Senior Software Engineer
>>>>>>>>>
>>>>>>>>> *WSO2, Inc.*lean.enterprise.middleware.
>>>>>>>>>
>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>> *E-mail* : [email protected]
>>>>>>>>> *Skype* : @manindae
>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from iPhone
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Sinthuja Rajendran*
>>>>>> Associate Technical Lead
>>>>>> WSO2, Inc.:http://wso2.com
>>>>>>
>>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>>> Mobile: +94774273955
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Sinthuja Rajendran*
>>>>> Associate Technical Lead
>>>>> WSO2, Inc.:http://wso2.com
>>>>>
>>>>> Blog: http://sinthu-rajan.blogspot.com/
>>>>> Mobile: +94774273955
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Sinthuja Rajendran*
>>> Associate Technical Lead
>>> WSO2, Inc.:http://wso2.com
>>>
>>> Blog: http://sinthu-rajan.blogspot.com/
>>> Mobile: +94774273955
>>>
>>>
>>>
>>
>>
>> --
>> *Sinthuja Rajendran*
>> Associate Technical Lead
>> WSO2, Inc.:http://wso2.com
>>
>> Blog: http://sinthu-rajan.blogspot.com/
>> Mobile: +94774273955
>>
>>
>>
>
>
> --
> *Anjana Fernando*
> Senior Technical Lead
> WSO2 Inc. | http://wso2.com
> lean . enterprise . middleware
>



-- 
*Sinthuja Rajendran*
Associate Technical Lead
WSO2, Inc.:http://wso2.com

Blog: http://sinthu-rajan.blogspot.com/
Mobile: +94774273955

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] New data publisher is hanging

Reply via email to