[ 
https://issues.apache.org/jira/browse/CRUNCH-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742438#comment-15742438
 ] 

Andrew Olson commented on CRUNCH-630:
-------------------------------------

The current workaround for this bug is to set auto.offset.reset=earliest in the 
Kafka connection properties when creating the KafkaSource (or alternatively 
org.apache.crunch.kafka.connection.properties.auto.offset.reset=earliest in the 
Pipeline's Configuration).

We might consider making that a config override like the serializers [1], or at 
least flipping the default from latest to earliest if it's not specified.

[1] 
https://github.com/apache/crunch/blob/master/crunch-kafka/src/main/java/org/apache/crunch/kafka/KafkaSource.java#L156-L165

> KafkaRecordReader keeps retrying to poll data when the offset is reset to 
> latest offset
> ---------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-630
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-630
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Pooja Dhondge
>
> We recently saw this behavior where, if the offset it is trying to read from 
> doesn't exist on Kafka due to retention policy, the offset gets reset to 
> latest(default) and the KafkaRecordReader keeps retrying beyond 
> KAFKA_EMPTY_RETRY_ATTEMPTS_KEY
> {noformat}
> ...crunch.kafka.inputformat.KafkaRecordReader: No records retrieved but 
> pending offsets to consume therefore polling again. Attempt 17/10
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to