[ https://issues.apache.org/jira/browse/STORM-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139831#comment-16139831 ]
Jungtaek Lim commented on STORM-2231: ------------------------------------- [~chemist] [~kevinconaway] I guess I found suspicious spot, but possible fixes may affect performance so would like to spend some time to do performance tests on fixes. I didn't reproduce the issue and don't know it is easy to reproduce, so if one of you can help testing and verifying the patch it should be really helpful. > NULL in DisruptorQueue while multi-threaded ack > ----------------------------------------------- > > Key: STORM-2231 > URL: https://issues.apache.org/jira/browse/STORM-2231 > Project: Apache Storm > Issue Type: Bug > Components: storm-core > Affects Versions: 1.0.1, 1.1.0 > Reporter: Alexander Kharitonov > Priority: Critical > > I use simple topology with one spout (9 workers) and one bolt (9 workers). > I have topology.backpressure.enable: false in storm.yaml. > Spouts send about 10 000 000 tuples in 10 minutes. Pending for spout is 80 > 000. > Bolts buffer theirs tuples for 60 seconds and flush to database and ack > tuples in parallel (10 threads). > I read that OutputCollector can be used in many threads safely, so i use it. > I don't have any bottleneck in bolts(flushing to database) or spouts(kafka > spout), but about 2% of tuples fail due to tuple processing timeout (fails > are recordered in spout stats only). > I am sure that bolts ack all tuples. But some of acks don't come to spouts. > While multi-threaded acking i see many errors in worker logs like that: > 2016-12-01 13:21:10.741 o.a.s.u.DisruptorQueue [ERROR] NULL found in > disruptor-executor[3 3]-send-queue:853877 > I tried to use synchronized wrapper around OutputCollector to fix the error. > But it didn't help. > I found the workaround that helps me: i do all processing in bolt in multiple > threads but call OutputCollector.ack methods in a one single separate thread. > I think Storm has an error in the multi-threaded use of OutputCollector. > If my topology has much less load, like 500 000 tuples per 10 minutes, then > i don't lose any acks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)