[GitHub] [druid] jihoonson commented on a diff in pull request #12428: Fix indexMerger to respect the includeAllDimensions flag

GitBox Tue, 12 Apr 2022 09:34:55 -0700


jihoonson commented on code in PR #12428:
URL: https://github.com/apache/druid/pull/12428#discussion_r848642474



##########
processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java:
##########
@@ -1460,4 +1456,33 @@ private static <T extends TimeAndDimsPointer> T 
reorderRowPointerColumns(
       );
     }
   }
+
+  private static class DimensionMergerUtil

Review Comment:
   I renamed it to `DimensionsSpecInspector`, but am not sure what the right 
name is for this. I think the code structure being not well-organized in 
`IndexMerger` is probably one of the reasons that it's hard to find a good 
name. Perhaps, we should add a wrapper of `DimensionMerger` and 
`DimensionsSpecInspector` that creates `ColumnDescriptor`. Or, there could be a 
even better refactoring idea than that. But I would like to do such refactoring 
not in this PR but in others.



##########
core/src/test/java/org/apache/druid/data/input/impl/JsonInputFormatTest.java:
##########
@@ -72,4 +72,26 @@ public void testEquals()
               .withIgnoredFields("objectMapper")
               .verify();
   }
+
+  @Test
+  public void testUseFieldDiscovery_setKeepNullColumnsByDefault()
+  {
+    final JsonInputFormat format = new JsonInputFormat(
+        new JSONPathSpec(true, null),
+        null,
+        null
+    );
+    Assert.assertTrue(format.isKeepNullColumns());
+  }
+
+  @Test
+  public void testUseFieldDiscovery_doNotChangeKeepNullColumnsUserSets()

Review Comment:
   Added one.



##########
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/HashPartitionMultiPhaseParallelIndexingWithNullColumnTest.java:
##########
@@ -19,37 +19,61 @@
 
 package org.apache.druid.indexing.common.task.batch.parallel;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.core.JsonProcessingException;
 import com.fasterxml.jackson.databind.ObjectMapper;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
 import org.apache.druid.data.input.InputFormat;
+import org.apache.druid.data.input.InputRowSchema;
 import org.apache.druid.data.input.InputSource;
+import org.apache.druid.data.input.InputSourceReader;
+import org.apache.druid.data.input.InputSplit;
+import org.apache.druid.data.input.SplitHintSpec;
+import org.apache.druid.data.input.impl.ByteEntity;
 import org.apache.druid.data.input.impl.DimensionSchema;
 import org.apache.druid.data.input.impl.DimensionsSpec;
 import org.apache.druid.data.input.impl.InlineInputSource;
+import org.apache.druid.data.input.impl.InputEntityIteratingReader;
 import org.apache.druid.data.input.impl.JsonInputFormat;
+import org.apache.druid.data.input.impl.SplittableInputSource;
 import org.apache.druid.data.input.impl.TimestampSpec;
 import org.apache.druid.indexer.TaskState;
+import org.apache.druid.indexer.partitions.DimensionRangePartitionsSpec;
 import org.apache.druid.indexer.partitions.HashedPartitionsSpec;
+import org.apache.druid.indexer.partitions.PartitionsSpec;
 import org.apache.druid.indexing.common.LockGranularity;
 import org.apache.druid.indexing.common.task.Tasks;
 import org.apache.druid.java.util.common.Intervals;
 import org.apache.druid.java.util.common.StringUtils;
 import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.parsers.JSONPathFieldSpec;
+import org.apache.druid.java.util.common.parsers.JSONPathFieldType;
+import org.apache.druid.java.util.common.parsers.JSONPathSpec;
 import org.apache.druid.segment.indexing.DataSchema;
 import org.apache.druid.segment.indexing.granularity.UniformGranularitySpec;
 import org.apache.druid.timeline.DataSegment;
 import org.joda.time.Interval;
 import org.junit.Assert;
 import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
 
+import javax.annotation.Nullable;
+import java.io.File;
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.stream.Stream;
 
+@RunWith(Parameterized.class)
 public class HashPartitionMultiPhaseParallelIndexingWithNullColumnTest extends 
AbstractMultiPhaseParallelIndexingTest

Review Comment:
   Oops, thanks. I forgot to fix it before making this PR. Renamed it now.



##########
indexing-service/src/test/java/org/apache/druid/indexing/common/task/batch/parallel/HashPartitionMultiPhaseParallelIndexingWithNullColumnTest.java:
##########
@@ -165,37 +335,105 @@ public void 
testIngestNullColumn_storeEmptyColumnsOff_shouldNotStoreEmptyColumns
 
     Set<DataSegment> segments = 
getIndexingServiceClient().getPublishedSegments(task);
     Assert.assertFalse(segments.isEmpty());
+    final List<DimensionSchema> expectedDimensions = 
DimensionsSpec.getDefaultSchemas(
+        Collections.singletonList("ts")
+    );
     for (DataSegment segment : segments) {
-      Assert.assertFalse(segment.getDimensions().contains("unknownDim"));
+      Assert.assertEquals(expectedDimensions.size(), 
segment.getDimensions().size());
+      for (int i = 0; i < expectedDimensions.size(); i++) {
+        Assert.assertEquals(expectedDimensions.get(i).getName(), 
segment.getDimensions().get(i));
+      }
     }
   }
 
   private InputSource getInputSource() throws JsonProcessingException
   {
     final ObjectMapper mapper = getObjectMapper();
-    final List<Map<String, Object>> rows = ImmutableList.of(
-        ImmutableMap.of(
-            "ts", "2022-01-01",
-            "dim1", "val1",
-            "dim2", "val11"
-        ),
-        ImmutableMap.of(
-            "ts", "2022-01-02",
-            "dim1", "val2",
-            "dim2", "val12"
-        ),
-        ImmutableMap.of(
-            "ts", "2022-01-03",
-            "dim1", "val3",
-            "dim2", "val13"
-        )
-    );
+    final List<Map<String, Object>> rows = new ArrayList<>();

Review Comment:
   Refactored as suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a diff in pull request #12428: Fix indexMerger to respect the includeAllDimensions flag

Reply via email to