[ 
https://issues.apache.org/jira/browse/STORM-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498065#comment-14498065
 ] 

Robert Joseph Evans commented on STORM-773:
-------------------------------------------

OK I have a bit more information.  I traced through the failure logs looking at 
the batch that failed vs an earlier batch that succeeded, but in the same run 
of the test.  If found out that in this case it was the commit that failed, not 
the actual processing, and that both count bolts involved did not process the 
commit message properly.  They received it, and finished executing, but did not 
ack the commit message, and did not emit anything new to the coordinator 
related to the commit.

FAILURE CASE:

Summary:
{code}
spout/coordinator:7 -> count:3 [commit] received done (!!!!no-ack!!!!)
spout/coordinator:7 -> count:4 [commit] received done (!!!!no-ack!!!!)
spout/coordinator:7 -> 2bb7b476-3c35-42dd-bad6-519e8276d51c:1 [commit] received 
ack done
spout/coordinator:7 -> __acker:2 [__ack_init -3670038820065525526] received ack 
done
2bb7b476-3c35-42dd-bad6-519e8276d51c:1 -> ?:2 [__ack_ack 7261381623926650396] 
received ack done
{code}

for logs run
{code}grep 8932423455392050678 failure.txt{code}

SUCCESS CASE:
Summary:
{code}
spout/coordinator:7 -> count:3 [commit] received ack done
spout/coordinator:7 -> count:4 [commit] received ack done
spout/coordinator:7 -> 38f9f1bc-338c-4486-806b-ac946d3920e1:1 [commit] received 
ack done
38f9f1bc-338c-4486-806b-ac946d3920e1:1 -> __acker:2 [__ack_ack 
1031730095793096003] received ack done
spout/coordinator:7 -> __acker:2 [__ack_init 3961628406540289946] received ack 
done
count:4 -> 38f9f1bc-338c-4486-806b-ac946d3920e1:1 [coord] received ack done
count:3 -> 38f9f1bc-338c-4486-806b-ac946d3920e1:1 [coord] received ack done
count:4 -> __acker:2 [__ack_ack 31105666502947637] received ack done
count:3 -> __acker:2 [__ack_ack 2686861581078832178] received ack done
38f9f1bc-338c-4486-806b-ac946d3920e1:1 -> __acker:2 [__ack_ack 
4862926263550768684] received ack done
38f9f1bc-338c-4486-806b-ac946d3920e1:1 -> __acker:2 [__ack_ack 
6841047554966168562] received ack done
__acker:2 -> spout/coordinator:7 [__ack_ack] received spout-ack
{code}

For logs run
{code}grep 3333285042640611010 failure.txt{code}

Not sure what is happening that made the count bolts not respond properly to a 
commit, but I will see what I can come up with.

> backtype.storm.transactional-test fails periodically with timeout
> -----------------------------------------------------------------
>
>                 Key: STORM-773
>                 URL: https://issues.apache.org/jira/browse/STORM-773
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Robert Joseph Evans
>         Attachments: failure.txt, success.txt
>
>
> I'm not totally sure what is happening here, but fairly frequently now on my 
> mac running JDK8 backtype.storm.transactional-test will timeout.
> test-transactional-topology-restart seems to be the test in there that is 
> getting the timeouts.
> I made some modifications to the test to just run that one test case, and to 
> turn topology.debug on. I captured examples of it working and failing.  I'll 
> attach the logs files shortly.  I am have not really had much time to dig 
> into this, so I am not totally sure what is happening here.  I can see from 
> the logs that on the first run of the topology the failure case only emits 10 
> batches, where as the successful case outputs many more.  On the second run 
> of the topology the failure case starts off at batch 11, but does not go 
> beyond it.  Where as the successful case keeps going.
> I'll try to find some time to look into it more, but I'm not sure how much 
> time I will have in the near future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to