wjhypo commented on a change in pull request #9810:
URL: https://github.com/apache/druid/pull/9810#discussion_r418961446
##########
File path:
core/src/main/java/org/apache/druid/timeline/partition/HashBasedNumberedShardSpec.java
##########
@@ -73,7 +81,7 @@ public boolean isCompatible(Class<? extends ShardSpec> other)
@Override
public boolean isInChunk(long timestamp, InputRow inputRow)
{
- return (((long) hash(timestamp, inputRow)) - getPartitionNum()) %
getPartitions() == 0;
+ return (Math.abs(hash(timestamp, inputRow)) - getPartitionNum()) %
getPartitions() == 0;
Review comment:
Query time hashing should yield the same value as that produced during
ingestion time. This should be a bug. Before this change, no one is using this
api so no issue pops up before. During indexing, which segment to put a record
into is by calling `getLookup` function below, in which the 32 bit hashed value
is forced to stay as a positive integer by applying `Math.abs`. While here, it
was casting the 32 bit hashed value to a long to force it stay as a positive
integer. If the hashed value is 0xff_ff_ff_ff, Math.abs(0xff_ff_ff_ff) is
different from (long)(0xff_ff_ff_ff). It's better to make the change here than
in `getLookup` for backward compatibility otherwise users who have already used
hash based partitioning can't leverage the pruning and pruning will be wrong
unless they redo the ingestion to update segments.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]