hqx871 commented on code in PR #13303:
URL: https://github.com/apache/druid/pull/13303#discussion_r1096492578


##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -476,22 +502,29 @@ void emitDimValueCounts(
       final byte[] groupKey = buf.array();
 
       // Emit row-counter value.
-      write(context, groupKey, new DimValueCount("", "", 1));
-
-      for (final Map.Entry<String, Iterable<String>> dimAndValues : 
dims.entrySet()) {
-        final String dim = dimAndValues.getKey();
-
-        if (partitionDimension == null || partitionDimension.equals(dim)) {
-          final Iterable<String> dimValues = dimAndValues.getValue();
-
-          if (Iterables.size(dimValues) == 1) {
-            // Emit this value.
-            write(context, groupKey, new DimValueCount(dim, 
Iterables.getOnlyElement(dimValues), 1));
+      write(context, groupKey, new DimValueCount(Collections.emptyList(), 
StringTuple.create(), 1));
+
+      Iterator<List<String>> dimensionGroupIterator = 
dimensionGroupingSet.iterator();

Review Comment:
   > Ah, I see it now. Thanks for the clarification.
   > 
   > In the case of single_dim too, I was wondering if we even need this check 
anymore. I don't think we can submit a spec with null partition dimension even 
for single_dim, atleast for index_parallel. There is a check in 
`ParallelIndexTuningConfig` which validates that partition dimensions are 
always specified.
   > 
   > We should have a similar check in `HadoopTuningConfig` too. Secretly 
choosing a partition dimension behind the scenes is probably not so great. It's 
better to fail if no partition dimension has been specified. What do you think?
   
   This just to be compatible with the old logical. As you can see from the 
deleted 484 line code, when the partitionDimension is null, the old code will 
emit every dimValues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to