[ https://issues.apache.org/jira/browse/STORM-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hugo Louro reassigned STORM-2292: --------------------------------- Assignee: Hugo Louro > Kafka spout enhancement, for our of range edge cases > ---------------------------------------------------- > > Key: STORM-2292 > URL: https://issues.apache.org/jira/browse/STORM-2292 > Project: Apache Storm > Issue Type: Improvement > Components: storm-kafka > Affects Versions: 0.10.0 > Reporter: WayneZhou > Assignee: Hugo Louro > Fix For: 0.10.0 > > Attachments: KafkaSpout.java.txt > > Original Estimate: 336h > Remaining Estimate: 336h > > @hmcl and all, we have communicated via email for a while and going forward > let's talk in this thread so everyone is in same page. > Base on the spout from the community(written by you), we have several fixes > and it worked quite stable in our production for about 6 months. > We want to share the latest spout to you and could you please kindly help > review and merge to the community version if any fix is reasonable? we want > to avoid diverging too much from the community version. > Below are our major fixes: > For failed message, in next tuple method, originally the spout seek back to > the non-continuous offset, so the failed message will be polled again for > retry, say we seek back to message 10 for retry, now if kafka log file was > purged, earliest offset is 1000, it means we will seek to 10 but reset to > 1000 as per the reset policy, and we cannot poll the message 10, so spout not > work. > Our fix is: we manually catch the out of range exception, commit the offset > to earliest offset first, then seek to the earliest offset > Currently the way to find next committed offset is very complex, under some > edge cases – a), if no message acked back because bolt has some issue or > cannot catch up with the spout emit; b) seek back is happened frequently and > it is much faster than the message be acked back > We give each message a status – None, emit, acked, failed(if failed number is > bigger than the maximum retry, set to acked) > One of our use cases need ordering in partition level, so after seek back for > retry, we re-emit all the follow messages again no matter they have emitted > or not, if possible, maybe you can give an option here to configure it – > either re-emit all the message from the failed one, or just emit the failed > one, same as current version. > We record the message count for acked, failed, emitted, just for statistics. > Could you please kindly help review and let us know if you can merge it into > the community version? Any comments/concern pls feel free to let us know. > Btw, our code is attached in this Jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)