[
https://issues.apache.org/jira/browse/STORM-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708652#comment-15708652
]
Matthias Klein edited comment on STORM-2077 at 11/30/16 1:54 PM:
-----------------------------------------------------------------
The problem seems to be the following piece of code:
{code}
public class KafkaSpout<K, V> extends BaseRichSpout {
//...
private void doSeekRetriableTopicPartitions() {
//...
if (offsetAndMeta != null) {
kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1); // seek
to the next offset that is ready to commit in next commit cycle
} else {
kafkaConsumer.seekToEnd(toArrayList(rtp)); // Seek to last
committed offset <-- Indeed seeks to end of partition
}
{code}
However, this code seeks to the end of the partition, not to the last committed
offset. The following change seems to repair the issue:
{code}
if (offsetAndMeta != null) {
kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1); // seek
to the next offset that is ready to commit in next commit cycle
} else {
OffsetAndMetadata committed = kafkaConsumer.committed(rtp);
if(committed == null) {
// Nothing yet committed
kafkaConsumer.seekToBeginning(toArrayList(rtp));
} else {
// seek to entry after last committed offset
kafkaConsumer.seek(rtp, committed.offset() + 1);
}
{code}
was (Author: db3f):
The problem seems to be the following piece of code:
{code}
public class KafkaSpout<K, V> extends BaseRichSpout {
//...
private void doSeekRetriableTopicPartitions() {
//...
if (offsetAndMeta != null) {
kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1); // seek
to the next offset that is ready to commit in next commit cycle
} else {
kafkaConsumer.seekToEnd(toArrayList(rtp)); // Seek to last
committed offset <-- Ideed seeks to end of partition
}
{code}
However, this code seeks to the end of the partition, not to the last committed
offset. The following change seems to repair the issue:
{code}
if (offsetAndMeta != null) {
kafkaConsumer.seek(rtp, offsetAndMeta.offset() + 1); // seek
to the next offset that is ready to commit in next commit cycle
} else {
OffsetAndMetadata committed = kafkaConsumer.committed(rtp);
if(committed == null) { // Nothing yet committed
kafkaConsumer.seekToBeginning(toArrayList(rtp));
} else { // seek to entry after last committed offset
kafkaConsumer.seek(rtp, committed.offset() + 1);
}
{code}
> KafkaSpout doesn't retry failed tuples
> --------------------------------------
>
> Key: STORM-2077
> URL: https://issues.apache.org/jira/browse/STORM-2077
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-kafka
> Affects Versions: 1.0.2
> Reporter: Tobias Maier
>
> KafkaSpout does not retry all failed tuples.
> We used following Configuration:
> Map<String, Object> props = new HashMap<>();
> props.put(KafkaSpoutConfig.Consumer.GROUP_ID, "c1");
> props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER,
> ByteArrayDeserializer.class.getName());
> props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER,
> ByteArrayDeserializer.class.getName());
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> broker.bootstrapServer());
> KafkaSpoutStreams kafkaSpoutStreams = new
> KafkaSpoutStreams.Builder(FIELDS_KAFKA_EVENT, new
> String[]{"test-topic"}).build();
> KafkaSpoutTuplesBuilder<byte[], byte[]> kafkaSpoutTuplesBuilder = new
> KafkaSpoutTuplesBuilder.Builder<>(new
> KeyValueKafkaSpoutTupleBuilder("test-topic")).build();
> KafkaSpoutRetryService retryService = new
> KafkaSpoutLoggedRetryExponentialBackoff(KafkaSpoutLoggedRetryExponentialBackoff.TimeInterval.milliSeconds(1),
> KafkaSpoutLoggedRetryExponentialBackoff.TimeInterval.milliSeconds(1), 3,
> KafkaSpoutLoggedRetryExponentialBackoff.TimeInterval.seconds(1));
> KafkaSpoutConfig<byte[], byte[]> config = new
> KafkaSpoutConfig.Builder<>(props, kafkaSpoutStreams, kafkaSpoutTuplesBuilder,
> retryService)
> .setFirstPollOffsetStrategy(UNCOMMITTED_LATEST)
> .setMaxUncommittedOffsets(30)
> .setOffsetCommitPeriodMs(10)
> .setMaxRetries(3)
> .build();
> kafkaSpout = new org.apache.storm.kafka.spout.KafkaSpout<>(config);
> The downstream bolt fails every tuple and we expect, that those tuple will
> all be replayed. But that's not the case for every tuple.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)