wjhypo commented on a change in pull request #9810:
URL: https://github.com/apache/druid/pull/9810#discussion_r418961446



##########
File path: 
core/src/main/java/org/apache/druid/timeline/partition/HashBasedNumberedShardSpec.java
##########
@@ -73,7 +81,7 @@ public boolean isCompatible(Class<? extends ShardSpec> other)
   @Override
   public boolean isInChunk(long timestamp, InputRow inputRow)
   {
-    return (((long) hash(timestamp, inputRow)) - getPartitionNum()) % 
getPartitions() == 0;
+    return (Math.abs(hash(timestamp, inputRow)) - getPartitionNum()) % 
getPartitions() == 0;

Review comment:
       Query time hashing should yield the same value as that produced during 
ingestion time. This should be a bug. Before this change, no one is using this 
api so no issue pops up before. During indexing, which segment to put a record 
into is by calling `getLookup` function below, in which the 32 bit hashed value 
is forced to stay as a positive integer by applying `Math.abs`. While here, it 
was casting the 32 bit hashed value to a long to force it stay as a positive 
integer. If the hashed value is 0xff_ff_ff_ff, Math.abs(0xff_ff_ff_ff) is 
different from (long)(0xff_ff_ff_ff). It's better to make the change here than 
in `getLookup` for backward compatibility otherwise users who have already used 
hash based partitioning can't leverage the pruning and pruning will be wrong 
unless they redo the ingestion to update segments.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to