[GitHub] [druid] kfaraz commented on a diff in pull request #13303: Hadoop based batch ingestion support range partition

GitBox Mon, 09 Jan 2023 21:07:43 -0800


kfaraz commented on code in PR #13303:
URL: https://github.com/apache/druid/pull/13303#discussion_r1065327862



##########
core/src/main/java/org/apache/druid/indexer/partitions/SingleDimensionPartitionsSpec.java:
##########
@@ -146,6 +154,44 @@ public String 
getForceGuaranteedRollupIncompatiblityReason()
     return FORCE_GUARANTEED_ROLLUP_COMPATIBLE;
   }
 
+  @Override
+  public List<List<String>> getDimensionGroupingSet(DimensionsSpec 
dimensionsSpec)

Review Comment:
   This method always returns a singleton list, so I think it should just 
return a `List<String>` rather than a list of lists.
   
   This whole logic should be kept in `DeterminePartitionsJob` itself. We can 
just have a logic like this: 
   
   ```
   List<String> partitionDimensions = partitionsSpec.getPartitionDimensions();
   if (partitionDimensions.isEmpty()) {
      // determine the partitions from the DimensionsSpec
   }
   ```
   



##########
core/src/main/java/org/apache/druid/indexer/partitions/DimensionRangePartitionsSpec.java:
##########
@@ -144,6 +148,27 @@ public List<String> getPartitionDimensions()
     return partitionDimensions;
   }
 
+  public List<List<String>> getDimensionGroupingSet(DimensionsSpec 
dimensionsSpec)

Review Comment:
   The logic in these new methods should be kept in `DeterminePartitionsJob` 
itself as they are only used for Hadoop ingestion.



##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -626,33 +677,39 @@ protected void innerReduce(Context context, SortableBytes 
keyBytes, Iterable<Dim
       final DimValueCount firstDvc = iterator.next();
       final long totalRows = firstDvc.numRows;
 
-      if (!"".equals(firstDvc.dim) || !"".equals(firstDvc.value)) {
+      if (!Collections.emptyList().equals(firstDvc.dims) || 
firstDvc.values.size() != 0) {

Review Comment:
   ```suggestion
         if (!firstDvc.dims.isEmpty() || !firstDvc.values.isEmpty()) {
   ```



##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -437,13 +458,18 @@ protected void innerMap(
   static class DeterminePartitionsDimSelectionMapperHelper
   {
     private final HadoopDruidIndexerConfig config;
-    private final String partitionDimension;
+    private final List<List<String>> dimensionGroupingSet;
     private final Map<Long, Integer> intervalIndexes;
+    private final boolean supportNullValue;
 
-    DeterminePartitionsDimSelectionMapperHelper(HadoopDruidIndexerConfig 
config, String partitionDimension)
+    DeterminePartitionsDimSelectionMapperHelper(HadoopDruidIndexerConfig 
config)
     {
       this.config = config;
-      this.partitionDimension = partitionDimension;
+
+      DimensionRangePartitionsSpec spec = (DimensionRangePartitionsSpec) 
config.getPartitionsSpec();
+      final DimensionsSpec dimensionsSpec = 
config.getSchema().getDataSchema().getDimensionsSpec();
+      this.dimensionGroupingSet = new 
ArrayList<>(spec.getDimensionGroupingSet(dimensionsSpec));
+      this.supportNullValue = spec.supportNullValue();

Review Comment:
   This can be determined by using the count of partition dimensions instead.
   ```suggestion
         this.supportNullValue = partitionDimensions.size() > 1;
   ```



##########
indexing-hadoop/src/main/java/org/apache/druid/indexer/DeterminePartitionsJob.java:
##########
@@ -476,22 +502,29 @@ void emitDimValueCounts(
       final byte[] groupKey = buf.array();
 
       // Emit row-counter value.
-      write(context, groupKey, new DimValueCount("", "", 1));
-
-      for (final Map.Entry<String, Iterable<String>> dimAndValues : 
dims.entrySet()) {
-        final String dim = dimAndValues.getKey();
-
-        if (partitionDimension == null || partitionDimension.equals(dim)) {
-          final Iterable<String> dimValues = dimAndValues.getValue();
-
-          if (Iterables.size(dimValues) == 1) {
-            // Emit this value.
-            write(context, groupKey, new DimValueCount(dim, 
Iterables.getOnlyElement(dimValues), 1));
+      write(context, groupKey, new DimValueCount(Collections.emptyList(), 
StringTuple.create(), 1));
+
+      Iterator<List<String>> dimensionGroupIterator = 
dimensionGroupingSet.iterator();

Review Comment:
   This should be simplified as the `dimensionGroupingSet` is always a 
singleton list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kfaraz commented on a diff in pull request #13303: Hadoop based batch ingestion support range partition

Reply via email to