abhishekagarwal87 commented on a change in pull request #11973:
URL: https://github.com/apache/druid/pull/11973#discussion_r754830086
##########
File path:
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/PartialDimensionDistributionTask.java
##########
@@ -68,8 +68,9 @@
{
public static final String TYPE = "partial_dimension_distribution";
- // Future work: StringDistribution does not handle inserting NULLs. This is
the same behavior as hadoop indexing.
- private static final boolean SKIP_NULL = true;
+ // Do not skip nulls as StringDistribution can handle null values.
+ // This behavior is different from hadoop indexing.
+ private static final boolean SKIP_NULL = false;
Review comment:
can this be selectively turned on only when more than one dimension is
being used? I don't know for certain what the impact of not skipping null will
be but then that impact will be limited to new range partitioning only. or it
can be based on a flag that you can pass via the context. thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]