vvcephei commented on a change in pull request #8896:
URL: https://github.com/apache/kafka/pull/8896#discussion_r442397523
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java
##########
@@ -415,19 +418,20 @@ public void restore() {
// for restoring active and updating standby we may prefer
different poll time
// in order to make sure we call the main consumer#poll in
time.
// TODO: once we move ChangelogReader to a separate thread
this may no longer be a concern
- polledRecords =
restoreConsumer.poll(state.equals(ChangelogReaderState.STANDBY_UPDATING) ?
Duration.ZERO : pollTime);
+ polledRecords = restoreConsumer.poll(state ==
ChangelogReaderState.STANDBY_UPDATING ? Duration.ZERO : pollTime);
Review comment:
trivial cleanup
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java
##########
@@ -415,19 +418,20 @@ public void restore() {
// for restoring active and updating standby we may prefer
different poll time
// in order to make sure we call the main consumer#poll in
time.
// TODO: once we move ChangelogReader to a separate thread
this may no longer be a concern
- polledRecords =
restoreConsumer.poll(state.equals(ChangelogReaderState.STANDBY_UPDATING) ?
Duration.ZERO : pollTime);
+ polledRecords = restoreConsumer.poll(state ==
ChangelogReaderState.STANDBY_UPDATING ? Duration.ZERO : pollTime);
} catch (final InvalidOffsetException e) {
- log.warn("Encountered {} fetching records from restore
consumer for partitions {}, it is likely that " +
+ log.warn("Encountered " + e.getClass().getName() +
+ " fetching records from restore consumer for partitions "
+ e.partitions() + ", it is likely that " +
"the consumer's position has fallen out of the topic
partition offset range because the topic was " +
"truncated or compacted on the broker, marking the
corresponding tasks as corrupted and re-initializing" +
- " it later.", e.getClass().getName(), e.partitions());
+ " it later.", e);
Review comment:
Added the exception itself as the "cause" of the warning. The actual
message of the IOE is actually pretty good at explaining the root cause.
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java
##########
@@ -446,6 +450,38 @@ public void restore() {
}
maybeUpdateLimitOffsetsForStandbyChangelogs();
+
+ maybeLogRestorationProgress();
Review comment:
This is the main change. Once every ten seconds, we will log the
progress for each active restoring changelog.
##########
File path:
streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java
##########
@@ -415,19 +418,20 @@ public void restore() {
// for restoring active and updating standby we may prefer
different poll time
// in order to make sure we call the main consumer#poll in
time.
// TODO: once we move ChangelogReader to a separate thread
this may no longer be a concern
- polledRecords =
restoreConsumer.poll(state.equals(ChangelogReaderState.STANDBY_UPDATING) ?
Duration.ZERO : pollTime);
+ polledRecords = restoreConsumer.poll(state ==
ChangelogReaderState.STANDBY_UPDATING ? Duration.ZERO : pollTime);
} catch (final InvalidOffsetException e) {
- log.warn("Encountered {} fetching records from restore
consumer for partitions {}, it is likely that " +
+ log.warn("Encountered " + e.getClass().getName() +
+ " fetching records from restore consumer for partitions "
+ e.partitions() + ", it is likely that " +
"the consumer's position has fallen out of the topic
partition offset range because the topic was " +
"truncated or compacted on the broker, marking the
corresponding tasks as corrupted and re-initializing" +
- " it later.", e.getClass().getName(), e.partitions());
+ " it later.", e);
final Map<TaskId, Collection<TopicPartition>>
taskWithCorruptedChangelogs = new HashMap<>();
for (final TopicPartition partition : e.partitions()) {
final TaskId taskId =
changelogs.get(partition).stateManager.taskId();
taskWithCorruptedChangelogs.computeIfAbsent(taskId, k ->
new HashSet<>()).add(partition);
}
- throw new TaskCorruptedException(taskWithCorruptedChangelogs);
+ throw new TaskCorruptedException(taskWithCorruptedChangelogs,
e);
Review comment:
Also added the cause to the thrown exception.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]