gianm commented on a change in pull request #8644: Fix Kinesis resharding issues
URL: https://github.com/apache/incubator-druid/pull/8644#discussion_r332606065
##########
File path:
extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/supervisor/KinesisSupervisor.java
##########
@@ -314,4 +356,94 @@ protected boolean
useExclusiveStartSequenceNumberForNonFirstSequence()
{
return true;
}
+
+ @Override
+ protected Map<String, OrderedSequenceNumber<String>>
filterDeadShardsFromStartingOffsets(
+ Map<String, OrderedSequenceNumber<String>> startingOffsets
+ )
+ {
+ Map<String, OrderedSequenceNumber<String>> filteredOffsets = new
HashMap<>();
+ for (Map.Entry<String, OrderedSequenceNumber<String>> entry :
startingOffsets.entrySet()) {
+ if
(!entry.getValue().get().equals(KinesisSequenceNumber.END_OF_SHARD_MARKER)) {
+ filteredOffsets.put(entry.getKey(), entry.getValue());
+ } else {
+ log.info("Excluding shard[%s] because it has reached EOS.",
entry.getKey());
+ }
+ }
+ return filteredOffsets;
+ }
+
+ @Override
+ protected void cleanupDeadShards(Set<String> expiredShards)
+ {
+ log.info("Cleaning up dead shards: " + expiredShards);
+
+ final KinesisDataSourceMetadata dataSourceMetadata =
+ (KinesisDataSourceMetadata)
getIndexerMetadataStorageCoordinator().getDataSourceMetadata(dataSource);
Review comment:
The introduction of `getIndexerMetadataStorageCoordinator()` seems like an
abstraction-break, in that the parent class is meant to handle interactions
with the metadata store and the subclass is meant to handle interactions and
customizations relevant to the upstream system.
It is used for two things:
- Getting the currently-committed datasource metadata.
- Later on, possibly resetting it.
These actions when done outside of the parent class, raise some alarms about
possible racey behavior.
Maybe the currently-committed datasource metadata could be passed into
`cleanupDeadShards`? And it could return new metadata, which the parent class
could commit after doing some sanity checks?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]