GitHub user curtisallen opened a pull request:

    https://github.com/apache/incubator-storm/pull/183

    STORM-399 update KafkaConfig.maxOffsetBehind default to be Long.MAX_VALUE

    
    I've recently upgraded to storm and storm-kafka `0.9.2-incubating`, 
replacing the https://github.com/wurstmeister/storm-kafka-0.8-plus spout I was 
using previously.
      
    I have a large kafka log that I needed processed. I started my topology 
with 
    ```
    storm.kafka.SpoutConfig spoutConfig = new SpoutConfig....
    spoutConfig.forceFromStart = true;
    ```
    I then needed to make some tweaks in my application code and restarted the 
topology with spoutConfig.forceFromStart = false. Expecting to pick up where I 
left off in my kafka log. Instead the kafka spout started from the latest 
offset. Upon investigation I found this log message in my storm worker logs
    ```
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit offset 
from zookeeper: 15266940; old topology_id: ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea 
- new topology_id: 5747dba6-c947-4c4f-af4a-4f50a84817bf
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset from 
zookeeper: 15266940
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614 is 
more than 100000 behind, resetting to startOffsetTime=-2
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka 
prd-use1c-pr-08-kafka-kamq-0004:4 from offset 22092614
    ```
    
    Digging in the storm-kafka spout I found this line
     
https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java#L95
    
    To fix this problem I ended up setting my spout config like so
    ```
    spoutConf.maxOffsetBehind = Long.MAX_VALUE; 
    ```
    Now finally to my question.
    
    Why would the kafka spout skip to the latest offset if the current offset 
is more then 100000 behind by default?
    
    This seems like a bad default value, the spout literally skipped over 
months of data without any warning.   
    
    This pull request sets the default value to `Long.MAX_VALUE`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/curtisallen/incubator-storm 
STORM-399-kafka-spout-increase-default

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-storm/pull/183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #183
    
----
commit 2b1d6cfe9b3d567234246b2e01bfb02b64131302
Author: Curtis Allen <[email protected]>
Date:   2014-07-09T21:43:14Z

    STORM-399 update KafkaConfig.maxOffsetBehind default to be Long.MAX_VALUE

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to