gianm commented on a change in pull request #8644: Fix Kinesis resharding issues
URL: https://github.com/apache/incubator-druid/pull/8644#discussion_r332836956
 
 

 ##########
 File path: 
extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/supervisor/KinesisSupervisor.java
 ##########
 @@ -212,14 +225,47 @@ protected void 
scheduleReporting(ScheduledExecutorService reportingExec)
     // not yet implemented, see issue #6739
   }
 
+
+  /**
+   * We try to parse the shard number of the shard ID, using a BigInteger 
because the Kinesis shard ID can be
+   * up to 128 characters. The shard number is used preferentially because it 
provides a fixed and easily predictable
+   * mapping from shard to task group number.
+   *
+   * If we can't parse the shard number from the ID, then we fall back to 
hashing the shard ID string.
 
 Review comment:
   I don't think it can happen today, given how Kinesis names shards. It might 
happen in the future if they change the format of a shard identifier for some 
reason. I don't think that's likely to happen, since it would probably break a 
ton of stuff so Amazon would hesitate to do it. So just doing the BigInteger 
path only (and throwing an exception if they can't be parsed for some reason) 
is a risk but a small one. On the other hand it's zero risk to treat them as 
opaque strings.
   
   I read the code in the PR as trying to have it both ways, a 'nice' behavior 
(assign shards round-robin based on shard number) but also zero risk of Amazon 
doing an unexpected change and breaking us.
   
   Fwiw I think if I was writing the patch then I would go with using the zero 
risk SHA path only — it would mean shard assignment that isn't as 
human-predictable, but it shortens up the code while still avoiding 
forward-compatibility risk, which seems worth it to me.
   
   In the end I am inclined to leave it up to the author @jon-wei.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to