imply-cheddar commented on code in PR #13561:
URL: https://github.com/apache/druid/pull/13561#discussion_r1049122636


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplier.java:
##########
@@ -574,22 +579,34 @@ public BitmapColumnIndex forValue(@Nullable String value)
         public double estimateSelectivity(int totalRows)
         {
           if (longValue == null) {
-            return (double) getBitmap(localDictionary.indexOf(0)).size() / 
totalRows;
+            if (inputNull) {
+              return (double) getBitmap(localDictionary.indexOf(0)).size() / 
totalRows;

Review Comment:
   This is one of those hard-codings of null as 0.  I wonder if maybe we should 
add an `indexOfNull()` method that would only be implemented by the things that 
actually have null as 0 as an extra safe-guard to ensure that we only do that 
when we know that we are using null.  The local implementation can special case 
`0` more safely.
   
   What do you think?



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplier.java:
##########
@@ -336,17 +336,21 @@ public BitmapColumnIndex forValue(@Nullable String value)
         @Override
         public double estimateSelectivity(int totalRows)
         {
-          return (double) getBitmap(
-              
localDictionary.indexOf(stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value)))
-          ).size() / totalRows;
+          final int globalId = 
stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value));
+          if (globalId < 0) {
+            return 0.0;
+          }
+          return (double) getBitmap(localDictionary.indexOf(globalId)).size() 
/ totalRows;
         }
 
         @Override
         public <T> T computeBitmapResult(BitmapResultFactory<T> 
bitmapResultFactory)
         {
-          return bitmapResultFactory.wrapDimensionValue(
-              
getBitmap(localDictionary.indexOf(stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value))))
-          );
+          final int globalId = 
stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value));
+          if (globalId < 0) {
+            return 
bitmapResultFactory.wrapDimensionValue(bitmapFactory.makeEmptyImmutableBitmap());
+          }
+          return 
bitmapResultFactory.wrapDimensionValue(getBitmap(localDictionary.indexOf(globalId)));

Review Comment:
   We seem to do the same general checks between these two methods on a regular 
basis.  Maybe we should add an `AtomicReference<>` to the return object and 
another method that builds and stores an object.  It seems like the 3 possible 
outcomes are
   
   1. We are looking for `null`
   2. We are looking for a value that doesn't exist
   3. We are looking for a value that does exist
   
   If the Reference was to an object that just maintained those 3 states, we 
could do that once, cache it and then implement each of these methods off of 
those states.  I'm making this suggestion as I think this structuring might be 
less bug prone (and potentially even faster as I believe both methods get 
called in the course of a query, so not doing the same lookups multiple times 
could be a gain, though likely very small).



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -156,7 +156,11 @@ public String lookupName(int id)
   @Override
   public int lookupId(String name)
   {
-    return dictionary.indexOf(getIdFromGlobalDictionary(name));
+    final int globalId = getIdFromGlobalDictionary(name);
+    if (globalId < 0) {
+      return -1;
+    }

Review Comment:
   Users of this aren't going to rely on the whole "here's the spot to insert" 
semantics, right?



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -509,6 +527,124 @@ public boolean isNull()
         };
       }
     }
+    if (singleType == null) {
+      return new ColumnValueSelector<Object>()
+      {
+
+        private PeekableIntIterator nullIterator = 
nullBitmap.peekableIterator();
+        private int nullMark = -1;
+        private int offsetMark = -1;
+
+        @Nullable
+        @Override
+        public Object getObject()
+        {
+          final int localId = column.get(offset.getOffset());
+          final int globalId = dictionary.get(localId);
+          if (globalId < globalDictionary.size()) {
+            return 
StringUtils.fromUtf8Nullable(globalDictionary.get(globalId));
+          } else if (globalId < globalDictionary.size() + 
globalLongDictionary.size()) {
+            return globalLongDictionary.get(globalId - adjustLongId);
+          } else {
+            return globalDoubleDictionary.get(globalId - adjustDoubleId);
+          }
+        }
+
+        @Override
+        public float getFloat()
+        {
+          final int localId = column.get(offset.getOffset());
+          final int globalId = dictionary.get(localId);
+          if (globalId == 0) {
+            // zero
+            assert NullHandling.replaceWithDefault();

Review Comment:
   If we need the assert, do we not need an if check instead?



##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -146,7 +146,7 @@ public String lookupName(int id)
     final int globalId = dictionary.get(id);
     if (globalId < globalDictionary.size()) {
       return StringUtils.fromUtf8Nullable(globalDictionary.get(globalId));
-    } else if (globalId < adjustLongId + globalLongDictionary.size()) {
+    } else if (globalId < globalDictionary.size() + 
globalLongDictionary.size()) {

Review Comment:
   Why are these numbers different from each other?  Or is this change for 
readability rather than correctness?  If it is for readability, should we not 
do the same thing in `NestedFieldLiteralColumnIndexSupplier.getIndexes`?



##########
processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java:
##########
@@ -216,72 +223,124 @@ private void smokeTest(NestedDataComplexColumn column) 
throws IOException
     Assert.assertEquals(ImmutableSet.of(ColumnType.LONG), 
column.getColumnTypes(xPath));
     Assert.assertEquals(ColumnType.LONG, 
column.getColumnHolder(xPath).getCapabilities().toColumnType());
     ColumnValueSelector<?> xSelector = column.makeColumnValueSelector(xPath, 
offset);
+    DimensionSelector xDimSelector = column.makeDimensionSelector(xPath, 
offset, null);
     ColumnIndexSupplier xIndexSupplier = column.getColumnIndexSupplier(xPath);
     Assert.assertNotNull(xIndexSupplier);
     StringValueSetIndex xValueIndex = 
xIndexSupplier.as(StringValueSetIndex.class);
+    DruidPredicateIndex xPredicateIndex = 
xIndexSupplier.as(DruidPredicateIndex.class);
     NullValueIndex xNulls = xIndexSupplier.as(NullValueIndex.class);
 
     final List<NestedPathPart> yPath = NestedPathFinder.parseJsonPath("$.y");
     Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE), 
column.getColumnTypes(yPath));
     Assert.assertEquals(ColumnType.DOUBLE, 
column.getColumnHolder(yPath).getCapabilities().toColumnType());
     ColumnValueSelector<?> ySelector = column.makeColumnValueSelector(yPath, 
offset);
+    DimensionSelector yDimSelector = column.makeDimensionSelector(yPath, 
offset, null);
     ColumnIndexSupplier yIndexSupplier = column.getColumnIndexSupplier(yPath);
     Assert.assertNotNull(yIndexSupplier);
     StringValueSetIndex yValueIndex = 
yIndexSupplier.as(StringValueSetIndex.class);
+    DruidPredicateIndex yPredicateIndex = 
yIndexSupplier.as(DruidPredicateIndex.class);
     NullValueIndex yNulls = yIndexSupplier.as(NullValueIndex.class);
 
     final List<NestedPathPart> zPath = NestedPathFinder.parseJsonPath("$.z");
     Assert.assertEquals(ImmutableSet.of(ColumnType.STRING), 
column.getColumnTypes(zPath));
     Assert.assertEquals(ColumnType.STRING, 
column.getColumnHolder(zPath).getCapabilities().toColumnType());
     ColumnValueSelector<?> zSelector = column.makeColumnValueSelector(zPath, 
offset);
+    DimensionSelector zDimSelector = column.makeDimensionSelector(zPath, 
offset, null);
     ColumnIndexSupplier zIndexSupplier = column.getColumnIndexSupplier(zPath);
     Assert.assertNotNull(zIndexSupplier);
     StringValueSetIndex zValueIndex = 
zIndexSupplier.as(StringValueSetIndex.class);
+    DruidPredicateIndex zPredicateIndex = 
zIndexSupplier.as(DruidPredicateIndex.class);
     NullValueIndex zNulls = zIndexSupplier.as(NullValueIndex.class);
 
-    Assert.assertEquals(ImmutableList.of(xPath, yPath, zPath), 
column.getNestedFields());
+    final List<NestedPathPart> vPath = NestedPathFinder.parseJsonPath("$.v");
+    Assert.assertEquals(ImmutableSet.of(ColumnType.STRING, ColumnType.LONG, 
ColumnType.DOUBLE), column.getColumnTypes(vPath));
+    Assert.assertEquals(ColumnType.STRING, 
column.getColumnHolder(vPath).getCapabilities().toColumnType());
+    ColumnValueSelector<?> vSelector = column.makeColumnValueSelector(vPath, 
offset);
+    DimensionSelector vDimSelector = column.makeDimensionSelector(vPath, 
offset, null);
+    ColumnIndexSupplier vIndexSupplier = column.getColumnIndexSupplier(vPath);
+    Assert.assertNotNull(vIndexSupplier);
+    StringValueSetIndex vValueIndex = 
vIndexSupplier.as(StringValueSetIndex.class);
+    DruidPredicateIndex vPredicateIndex = 
vIndexSupplier.as(DruidPredicateIndex.class);
+    NullValueIndex vNulls = vIndexSupplier.as(NullValueIndex.class);
+
+    Assert.assertEquals(ImmutableList.of(vPath, xPath, yPath, zPath), 
column.getNestedFields());
 
     for (int i = 0; i < data.size(); i++) {
       Map row = data.get(i);
       Assert.assertEquals(
           JSON_MAPPER.writeValueAsString(row),
           
JSON_MAPPER.writeValueAsString(StructuredData.unwrap(rawSelector.getObject()))
       );
-      if (row.containsKey("x")) {
-        Assert.assertEquals(row.get("x"), xSelector.getObject());
-        Assert.assertEquals(row.get("x"), xSelector.getLong());
-        
Assert.assertTrue(xValueIndex.forValue(String.valueOf(row.get("x"))).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertFalse(xNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      } else {
-        Assert.assertNull(xSelector.getObject());
-        Assert.assertTrue(xSelector.isNull());
-        
Assert.assertTrue(xValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertTrue(xNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      }
-      if (row.containsKey("y")) {
-        Assert.assertEquals(row.get("y"), ySelector.getObject());
-        Assert.assertEquals(row.get("y"), ySelector.getDouble());
-        
Assert.assertTrue(yValueIndex.forValue(String.valueOf(row.get("y"))).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertFalse(yNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      } else {
-        Assert.assertNull(ySelector.getObject());
-        Assert.assertTrue(ySelector.isNull());
-        
Assert.assertTrue(yValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertTrue(yNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      }
-      if (row.containsKey("z")) {
-        Assert.assertEquals(row.get("z"), zSelector.getObject());
-        Assert.assertTrue(zValueIndex.forValue((String) 
row.get("z")).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertFalse(zNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      } else {
-        Assert.assertNull(zSelector.getObject());
-        
Assert.assertTrue(zValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-        
Assert.assertTrue(zNulls.forNull().computeBitmapResult(resultFactory).get(i));
-      }
+
+      testPath(row, i, "v", vSelector, vDimSelector, vValueIndex, 
vPredicateIndex, vNulls, null);
+      testPath(row, i, "x", xSelector, xDimSelector, xValueIndex, 
xPredicateIndex, xNulls, ColumnType.LONG);
+      testPath(row, i, "y", ySelector, yDimSelector, yValueIndex, 
yPredicateIndex, yNulls, ColumnType.DOUBLE);
+      testPath(row, i, "z", zSelector, zDimSelector, zValueIndex, 
zPredicateIndex, zNulls, ColumnType.STRING);
+
       offset.increment();
     }
   }
 
+  private void testPath(
+      Map row,
+      int rowNumber,
+      String path,
+      ColumnValueSelector<?> valueSelector,
+      DimensionSelector dimSelector,
+      StringValueSetIndex valueSetIndex,
+      DruidPredicateIndex predicateIndex,
+      NullValueIndex nullValueIndex,
+      @Nullable ColumnType singleType
+  )
+  {
+    if (row.containsKey(path) && row.get(path) != null) {

Review Comment:
   I don't see a validation for `Assert.assertFalse(valueSelector.isNull())` 
inside of this if block.  I might just be blind though



##########
processing/src/test/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplierTest.java:
##########
@@ -726,6 +808,26 @@ public void testSingleValueLongWithNullRangeIndex() throws 
IOException
     ImmutableBitmap bitmap = forRange.computeBitmapResult(bitmapResultFactory);
     checkBitmap(bitmap, 0, 6, 7);
 
+    forRange = rangeIndex.forRange(100, true, 300, true);
+    Assert.assertEquals(0.0, forRange.estimateSelectivity(10), 0.0);
+    bitmap = forRange.computeBitmapResult(bitmapResultFactory);
+    checkBitmap(bitmap);
+
+    forRange = rangeIndex.forRange(100, false, 300, true);
+    Assert.assertEquals(0.2, forRange.estimateSelectivity(10), 0.0);
+    bitmap = forRange.computeBitmapResult(bitmapResultFactory);
+    checkBitmap(bitmap, 0, 6);

Review Comment:
   I see this expansion of the ranges checked for the single typed columns.  
When I looked at the broader file, I didn't see a similar test for the variant 
typed varieties.  Is there a reason it doesn't make sense to have a similar 
style of range test checker thingie for a variant column too?  Maybe have a 
pure 3-typed variant, a String+Long, String+Double and Long+Double?



##########
processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java:
##########
@@ -463,13 +462,13 @@ public void testExpectedTypes() throws Exception
         NESTED_SPARSE_MIXED_NUMERIC_FIELD
     );
     Assert.assertNotNull(sparseMixedNumericValueSelector);
-    Assert.assertTrue(sparseMixedNumericValueSelector instanceof 
DimensionDictionarySelector);
+    Assert.assertTrue(sparseMixedNumericValueSelector instanceof 
ColumnValueSelector);
 
     ColumnValueSelector sparseMixedValueSelector = 
columnSelectorFactory.makeColumnValueSelector(
         NESTED_SPARSE_MIXED_FIELD
     );
     Assert.assertNotNull(sparseMixedValueSelector);
-    Assert.assertTrue(sparseMixedValueSelector instanceof 
DimensionDictionarySelector);
+    Assert.assertTrue(sparseMixedValueSelector instanceof ColumnValueSelector);

Review Comment:
   I believe that this `instanceof` check would've been true before your 
changes too.  Perhaps you are wanting to ensure that it's not returning the 
`DimensionDictionarySelector`-based option, in which case maybe additionally 
have an
   
   ```
   Assert.assertFalse(sparseMixedValueSelector instanceof 
DimensionDictionarySelector);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to