imply-cheddar commented on code in PR #13561:
URL: https://github.com/apache/druid/pull/13561#discussion_r1049122636
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplier.java:
##########
@@ -574,22 +579,34 @@ public BitmapColumnIndex forValue(@Nullable String value)
public double estimateSelectivity(int totalRows)
{
if (longValue == null) {
- return (double) getBitmap(localDictionary.indexOf(0)).size() /
totalRows;
+ if (inputNull) {
+ return (double) getBitmap(localDictionary.indexOf(0)).size() /
totalRows;
Review Comment:
This is one of those hard-codings of null as 0. I wonder if maybe we should
add an `indexOfNull()` method that would only be implemented by the things that
actually have null as 0 as an extra safe-guard to ensure that we only do that
when we know that we are using null. The local implementation can special case
`0` more safely.
What do you think?
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplier.java:
##########
@@ -336,17 +336,21 @@ public BitmapColumnIndex forValue(@Nullable String value)
@Override
public double estimateSelectivity(int totalRows)
{
- return (double) getBitmap(
-
localDictionary.indexOf(stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value)))
- ).size() / totalRows;
+ final int globalId =
stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value));
+ if (globalId < 0) {
+ return 0.0;
+ }
+ return (double) getBitmap(localDictionary.indexOf(globalId)).size()
/ totalRows;
}
@Override
public <T> T computeBitmapResult(BitmapResultFactory<T>
bitmapResultFactory)
{
- return bitmapResultFactory.wrapDimensionValue(
-
getBitmap(localDictionary.indexOf(stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value))))
- );
+ final int globalId =
stringDictionary.indexOf(StringUtils.toUtf8ByteBuffer(value));
+ if (globalId < 0) {
+ return
bitmapResultFactory.wrapDimensionValue(bitmapFactory.makeEmptyImmutableBitmap());
+ }
+ return
bitmapResultFactory.wrapDimensionValue(getBitmap(localDictionary.indexOf(globalId)));
Review Comment:
We seem to do the same general checks between these two methods on a regular
basis. Maybe we should add an `AtomicReference<>` to the return object and
another method that builds and stores an object. It seems like the 3 possible
outcomes are
1. We are looking for `null`
2. We are looking for a value that doesn't exist
3. We are looking for a value that does exist
If the Reference was to an object that just maintained those 3 states, we
could do that once, cache it and then implement each of these methods off of
those states. I'm making this suggestion as I think this structuring might be
less bug prone (and potentially even faster as I believe both methods get
called in the course of a query, so not doing the same lookups multiple times
could be a gain, though likely very small).
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -156,7 +156,11 @@ public String lookupName(int id)
@Override
public int lookupId(String name)
{
- return dictionary.indexOf(getIdFromGlobalDictionary(name));
+ final int globalId = getIdFromGlobalDictionary(name);
+ if (globalId < 0) {
+ return -1;
+ }
Review Comment:
Users of this aren't going to rely on the whole "here's the spot to insert"
semantics, right?
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -509,6 +527,124 @@ public boolean isNull()
};
}
}
+ if (singleType == null) {
+ return new ColumnValueSelector<Object>()
+ {
+
+ private PeekableIntIterator nullIterator =
nullBitmap.peekableIterator();
+ private int nullMark = -1;
+ private int offsetMark = -1;
+
+ @Nullable
+ @Override
+ public Object getObject()
+ {
+ final int localId = column.get(offset.getOffset());
+ final int globalId = dictionary.get(localId);
+ if (globalId < globalDictionary.size()) {
+ return
StringUtils.fromUtf8Nullable(globalDictionary.get(globalId));
+ } else if (globalId < globalDictionary.size() +
globalLongDictionary.size()) {
+ return globalLongDictionary.get(globalId - adjustLongId);
+ } else {
+ return globalDoubleDictionary.get(globalId - adjustDoubleId);
+ }
+ }
+
+ @Override
+ public float getFloat()
+ {
+ final int localId = column.get(offset.getOffset());
+ final int globalId = dictionary.get(localId);
+ if (globalId == 0) {
+ // zero
+ assert NullHandling.replaceWithDefault();
Review Comment:
If we need the assert, do we not need an if check instead?
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedFieldLiteralDictionaryEncodedColumn.java:
##########
@@ -146,7 +146,7 @@ public String lookupName(int id)
final int globalId = dictionary.get(id);
if (globalId < globalDictionary.size()) {
return StringUtils.fromUtf8Nullable(globalDictionary.get(globalId));
- } else if (globalId < adjustLongId + globalLongDictionary.size()) {
+ } else if (globalId < globalDictionary.size() +
globalLongDictionary.size()) {
Review Comment:
Why are these numbers different from each other? Or is this change for
readability rather than correctness? If it is for readability, should we not
do the same thing in `NestedFieldLiteralColumnIndexSupplier.getIndexes`?
##########
processing/src/test/java/org/apache/druid/segment/nested/NestedDataColumnSupplierTest.java:
##########
@@ -216,72 +223,124 @@ private void smokeTest(NestedDataComplexColumn column)
throws IOException
Assert.assertEquals(ImmutableSet.of(ColumnType.LONG),
column.getColumnTypes(xPath));
Assert.assertEquals(ColumnType.LONG,
column.getColumnHolder(xPath).getCapabilities().toColumnType());
ColumnValueSelector<?> xSelector = column.makeColumnValueSelector(xPath,
offset);
+ DimensionSelector xDimSelector = column.makeDimensionSelector(xPath,
offset, null);
ColumnIndexSupplier xIndexSupplier = column.getColumnIndexSupplier(xPath);
Assert.assertNotNull(xIndexSupplier);
StringValueSetIndex xValueIndex =
xIndexSupplier.as(StringValueSetIndex.class);
+ DruidPredicateIndex xPredicateIndex =
xIndexSupplier.as(DruidPredicateIndex.class);
NullValueIndex xNulls = xIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> yPath = NestedPathFinder.parseJsonPath("$.y");
Assert.assertEquals(ImmutableSet.of(ColumnType.DOUBLE),
column.getColumnTypes(yPath));
Assert.assertEquals(ColumnType.DOUBLE,
column.getColumnHolder(yPath).getCapabilities().toColumnType());
ColumnValueSelector<?> ySelector = column.makeColumnValueSelector(yPath,
offset);
+ DimensionSelector yDimSelector = column.makeDimensionSelector(yPath,
offset, null);
ColumnIndexSupplier yIndexSupplier = column.getColumnIndexSupplier(yPath);
Assert.assertNotNull(yIndexSupplier);
StringValueSetIndex yValueIndex =
yIndexSupplier.as(StringValueSetIndex.class);
+ DruidPredicateIndex yPredicateIndex =
yIndexSupplier.as(DruidPredicateIndex.class);
NullValueIndex yNulls = yIndexSupplier.as(NullValueIndex.class);
final List<NestedPathPart> zPath = NestedPathFinder.parseJsonPath("$.z");
Assert.assertEquals(ImmutableSet.of(ColumnType.STRING),
column.getColumnTypes(zPath));
Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(zPath).getCapabilities().toColumnType());
ColumnValueSelector<?> zSelector = column.makeColumnValueSelector(zPath,
offset);
+ DimensionSelector zDimSelector = column.makeDimensionSelector(zPath,
offset, null);
ColumnIndexSupplier zIndexSupplier = column.getColumnIndexSupplier(zPath);
Assert.assertNotNull(zIndexSupplier);
StringValueSetIndex zValueIndex =
zIndexSupplier.as(StringValueSetIndex.class);
+ DruidPredicateIndex zPredicateIndex =
zIndexSupplier.as(DruidPredicateIndex.class);
NullValueIndex zNulls = zIndexSupplier.as(NullValueIndex.class);
- Assert.assertEquals(ImmutableList.of(xPath, yPath, zPath),
column.getNestedFields());
+ final List<NestedPathPart> vPath = NestedPathFinder.parseJsonPath("$.v");
+ Assert.assertEquals(ImmutableSet.of(ColumnType.STRING, ColumnType.LONG,
ColumnType.DOUBLE), column.getColumnTypes(vPath));
+ Assert.assertEquals(ColumnType.STRING,
column.getColumnHolder(vPath).getCapabilities().toColumnType());
+ ColumnValueSelector<?> vSelector = column.makeColumnValueSelector(vPath,
offset);
+ DimensionSelector vDimSelector = column.makeDimensionSelector(vPath,
offset, null);
+ ColumnIndexSupplier vIndexSupplier = column.getColumnIndexSupplier(vPath);
+ Assert.assertNotNull(vIndexSupplier);
+ StringValueSetIndex vValueIndex =
vIndexSupplier.as(StringValueSetIndex.class);
+ DruidPredicateIndex vPredicateIndex =
vIndexSupplier.as(DruidPredicateIndex.class);
+ NullValueIndex vNulls = vIndexSupplier.as(NullValueIndex.class);
+
+ Assert.assertEquals(ImmutableList.of(vPath, xPath, yPath, zPath),
column.getNestedFields());
for (int i = 0; i < data.size(); i++) {
Map row = data.get(i);
Assert.assertEquals(
JSON_MAPPER.writeValueAsString(row),
JSON_MAPPER.writeValueAsString(StructuredData.unwrap(rawSelector.getObject()))
);
- if (row.containsKey("x")) {
- Assert.assertEquals(row.get("x"), xSelector.getObject());
- Assert.assertEquals(row.get("x"), xSelector.getLong());
-
Assert.assertTrue(xValueIndex.forValue(String.valueOf(row.get("x"))).computeBitmapResult(resultFactory).get(i));
-
Assert.assertFalse(xNulls.forNull().computeBitmapResult(resultFactory).get(i));
- } else {
- Assert.assertNull(xSelector.getObject());
- Assert.assertTrue(xSelector.isNull());
-
Assert.assertTrue(xValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-
Assert.assertTrue(xNulls.forNull().computeBitmapResult(resultFactory).get(i));
- }
- if (row.containsKey("y")) {
- Assert.assertEquals(row.get("y"), ySelector.getObject());
- Assert.assertEquals(row.get("y"), ySelector.getDouble());
-
Assert.assertTrue(yValueIndex.forValue(String.valueOf(row.get("y"))).computeBitmapResult(resultFactory).get(i));
-
Assert.assertFalse(yNulls.forNull().computeBitmapResult(resultFactory).get(i));
- } else {
- Assert.assertNull(ySelector.getObject());
- Assert.assertTrue(ySelector.isNull());
-
Assert.assertTrue(yValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-
Assert.assertTrue(yNulls.forNull().computeBitmapResult(resultFactory).get(i));
- }
- if (row.containsKey("z")) {
- Assert.assertEquals(row.get("z"), zSelector.getObject());
- Assert.assertTrue(zValueIndex.forValue((String)
row.get("z")).computeBitmapResult(resultFactory).get(i));
-
Assert.assertFalse(zNulls.forNull().computeBitmapResult(resultFactory).get(i));
- } else {
- Assert.assertNull(zSelector.getObject());
-
Assert.assertTrue(zValueIndex.forValue(null).computeBitmapResult(resultFactory).get(i));
-
Assert.assertTrue(zNulls.forNull().computeBitmapResult(resultFactory).get(i));
- }
+
+ testPath(row, i, "v", vSelector, vDimSelector, vValueIndex,
vPredicateIndex, vNulls, null);
+ testPath(row, i, "x", xSelector, xDimSelector, xValueIndex,
xPredicateIndex, xNulls, ColumnType.LONG);
+ testPath(row, i, "y", ySelector, yDimSelector, yValueIndex,
yPredicateIndex, yNulls, ColumnType.DOUBLE);
+ testPath(row, i, "z", zSelector, zDimSelector, zValueIndex,
zPredicateIndex, zNulls, ColumnType.STRING);
+
offset.increment();
}
}
+ private void testPath(
+ Map row,
+ int rowNumber,
+ String path,
+ ColumnValueSelector<?> valueSelector,
+ DimensionSelector dimSelector,
+ StringValueSetIndex valueSetIndex,
+ DruidPredicateIndex predicateIndex,
+ NullValueIndex nullValueIndex,
+ @Nullable ColumnType singleType
+ )
+ {
+ if (row.containsKey(path) && row.get(path) != null) {
Review Comment:
I don't see a validation for `Assert.assertFalse(valueSelector.isNull())`
inside of this if block. I might just be blind though
##########
processing/src/test/java/org/apache/druid/segment/nested/NestedFieldLiteralColumnIndexSupplierTest.java:
##########
@@ -726,6 +808,26 @@ public void testSingleValueLongWithNullRangeIndex() throws
IOException
ImmutableBitmap bitmap = forRange.computeBitmapResult(bitmapResultFactory);
checkBitmap(bitmap, 0, 6, 7);
+ forRange = rangeIndex.forRange(100, true, 300, true);
+ Assert.assertEquals(0.0, forRange.estimateSelectivity(10), 0.0);
+ bitmap = forRange.computeBitmapResult(bitmapResultFactory);
+ checkBitmap(bitmap);
+
+ forRange = rangeIndex.forRange(100, false, 300, true);
+ Assert.assertEquals(0.2, forRange.estimateSelectivity(10), 0.0);
+ bitmap = forRange.computeBitmapResult(bitmapResultFactory);
+ checkBitmap(bitmap, 0, 6);
Review Comment:
I see this expansion of the ranges checked for the single typed columns.
When I looked at the broader file, I didn't see a similar test for the variant
typed varieties. Is there a reason it doesn't make sense to have a similar
style of range test checker thingie for a variant column too? Maybe have a
pure 3-typed variant, a String+Long, String+Double and Long+Double?
##########
processing/src/test/java/org/apache/druid/query/scan/NestedDataScanQueryTest.java:
##########
@@ -463,13 +462,13 @@ public void testExpectedTypes() throws Exception
NESTED_SPARSE_MIXED_NUMERIC_FIELD
);
Assert.assertNotNull(sparseMixedNumericValueSelector);
- Assert.assertTrue(sparseMixedNumericValueSelector instanceof
DimensionDictionarySelector);
+ Assert.assertTrue(sparseMixedNumericValueSelector instanceof
ColumnValueSelector);
ColumnValueSelector sparseMixedValueSelector =
columnSelectorFactory.makeColumnValueSelector(
NESTED_SPARSE_MIXED_FIELD
);
Assert.assertNotNull(sparseMixedValueSelector);
- Assert.assertTrue(sparseMixedValueSelector instanceof
DimensionDictionarySelector);
+ Assert.assertTrue(sparseMixedValueSelector instanceof ColumnValueSelector);
Review Comment:
I believe that this `instanceof` check would've been true before your
changes too. Perhaps you are wanting to ensure that it's not returning the
`DimensionDictionarySelector`-based option, in which case maybe additionally
have an
```
Assert.assertFalse(sparseMixedValueSelector instanceof
DimensionDictionarySelector);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]