Github user jianbzhou commented on the issue:

    https://github.com/apache/storm/pull/1131
  
    @hmcl and all, we have communicated via email for a while and going forward 
let's talk in this thread so everyone is in same page.
    Base on the spout from the community(written by you), we have several fixes 
and it worked quite stable in our production for about 6 months.
    
    We want to share the latest spout to you and could you please kindly help 
review and merge to the community version if any fix is reasonable? we want to 
avoid diverging too much from the community version.
    
    Below are our major fixes:
    1.  For failed message, in next tuple method, originally the spout seek 
back to the non-continuous offset, so the failed message will be polled again 
for retry, say we seek back to message 10 for retry, now if kafka log file was 
purged, earliest offset is 1000, it means we will seek to 10 but reset to 1000 
as per the reset policy, and we cannot poll the message 10, so spout not work.
    Our fix is: we manually catch the out of range exception, commit the offset 
to earliest offset first, then seek to the earliest offset
    
    2.  Currently the way to find next committed offset is very complex, under 
some edge cases – a), if no message acked back because bolt has some issue or 
cannot catch up with the spout emit; b) seek back is happened frequently and it 
is much faster than the message be acked back
    We give each message a status – None, emit, acked, failed(if failed 
number is bigger than the maximum retry, set to acked)
    3.  One of our use cases need ordering in partition level, so after seek 
back for retry, we re-emit all the follow messages again no matter they have 
emitted or not, if possible, maybe you can give an option here to configure it 
– either re-emit all the message from the failed one, or just emit the failed 
one, same as current version.
    4.  We record the message count for acked, failed, emitted, just for 
statistics.
    
    Could you please kindly help review and let us know if you can merge it 
into the community version? Any comments/concern pls feel free to let us know. 
Btw, I just send the latest code to you via email. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to