jon-wei commented on a change in pull request #5734: Multiple dimension 
partitioning spec
URL: https://github.com/apache/incubator-druid/pull/5734#discussion_r207026515
 
 

 ##########
 File path: 
indexing-hadoop/src/main/java/io/druid/indexer/DeterminePartitionsJob.java
 ##########
 @@ -454,22 +457,45 @@ public void emitDimValueCounts(
       final byte[] groupKey = buf.array();
 
       // Emit row-counter value.
-      write(context, groupKey, new DimValueCount("", "", 1));
-
-      for (final Map.Entry<String, Iterable<String>> dimAndValues : 
dims.entrySet()) {
-        final String dim = dimAndValues.getKey();
-
-        if (partitionDimension == null || partitionDimension.equals(dim)) {
-          final Iterable<String> dimValues = dimAndValues.getValue();
-
-          if (Iterables.size(dimValues) == 1) {
-            // Emit this value.
-            write(context, groupKey, new DimValueCount(dim, 
Iterables.getOnlyElement(dimValues), 1));
-          } else {
-            // This dimension is unsuitable for partitioning. Poison it by 
emitting a negative value.
-            write(context, groupKey, new DimValueCount(dim, "", -1));
+      write(context, groupKey, new DimValueCount(Collections.emptyList(), 
Collections.emptyList(), 1));
+
+      // If parition dimensions is empty, then write DVC for each dim, so that 
is can be used to compute best dim
+      // to partition on.
+      if (partitionDimensions.isEmpty()) {
+        // Handle auto partitioning
+        for (final Map.Entry<String, Iterable<String>> dimAndValues : 
dims.entrySet()) {
+          final String dim = dimAndValues.getKey();
+
+          if (partitionDimensions.isEmpty() || 
partitionDimensions.contains(dim)) {
 
 Review comment:
   Is this check needed here? Looks like `partitionDimensions.isEmpty()` will 
always be true and `partitionDimensions.contains(dim)` will always be false

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to