Github user jianbzhou commented on the pull request:

    https://github.com/apache/storm/pull/1131#issuecomment-221922582
  
    @hmcl, currently if user give firstPollOffsetStrategy=UNCOMMITTED_LATEST or 
LATEST, the spout will not work, because if a kafka consumer re-balance 
happened, the offset will be seeked to the end, and there will be lots of 
messages not consumed/emitted/acked&failed, so will never find the next 
continuous offset to commit, so the log will keep showing that "Non continuous 
offset found"......
    
    I have a questions here - if a spout read and emit one message, I assume 
storm will ensure the message will be acked or failed without exception, right? 
because if it is possible that one emitted message failed to get acked or 
failed message under some strange situations, it means we cannot find the 
continuous message to commit, which will directly break the spout. Could you 
please help confirm if my assumption is correct?
    
    If my assumption is not correct - which means one emitted message may not 
be able to get acked or failed message back, then we must change the spout(need 
a timeout setting if failed to find next continuous message to commit) - 
currently the spout will always find the next continuous message to commit, it 
will try forever...
    
    due to the spout will always find the next continuous message to commit, we 
need to be cautious for below method:
    private boolean poll() {
            return !waitingToEmit() && emitted.size() < 
kafkaSpoutConfig.getMaxUncommittedOffsets();
        }
    if the MaxUncommittedOffsets is too small, the spout will frequently stop 
polling from kafka, if a rebalance happened and seek back to the failed 
message, at this moment if the spout stop polling, will also cause the spout 
failed to find the next committed message. Currently we set this value to 
200000 and seems working fine for now.
    Looking forward to hearing from you! thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to