imply-cheddar commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095351855
##########
processing/src/main/java/org/apache/druid/segment/vector/NilVectorSelector.java:
##########
@@ -82,9 +81,11 @@ public static NilVectorSelector create(final
VectorSizeInspector vectorSizeInspe
DEFAULT_OBJECT_VECTOR
);
} else {
+ final boolean[] nulls = new
boolean[vectorSizeInspector.getMaxVectorSize()];
+ Arrays.fill(nulls, NullHandling.sqlCompatible());
Review Comment:
The default initialization for a native `boolean` is false, so we should
really only do this if `NullHandling.sqlCompatible()` is true.
##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
ReadableOffset offset
)
{
- final NestedDataComplexColumn column =
NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
- if (column == null) {
- // complex column itself didn't exist
- return DimensionSelector.constant(null);
+ ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+ if (holder == null) {
+ // column doesn't exist
+ return dimensionSpec.decorate(DimensionSelector.constant(null,
dimensionSpec.getExtractionFn()));
}
if (hasNegativeArrayIndex) {
- return new FieldDimensionSelector(
- new RawFieldLiteralColumnValueSelector(
- column.makeColumnValueSelector(offset),
- parts
- )
- );
+ // if the path has negative array elements, then we have to use the
'raw' processing of the FieldDimensionSelector
+ // created with the column selector factory instead of using the
optimized nested field column, return null
+ // to fall through
+ return null;
}
- return column.makeDimensionSelector(parts, offset,
dimensionSpec.getExtractionFn());
+
+ return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder,
offset, dimensionSpec.getExtractionFn()));
+ }
+
+ private DimensionSelector makeDimensionSelectorUndecorated(
+ ColumnHolder holder,
+ ReadableOffset offset,
+ @Nullable ExtractionFn extractionFn
+ )
+ {
+ BaseColumn theColumn = holder.getColumn();
+ if (theColumn instanceof NestedDataComplexColumn) {
+ final NestedDataComplexColumn column = (NestedDataComplexColumn)
theColumn;
+ return column.makeDimensionSelector(parts, offset, extractionFn);
+ }
+
+ // ok, so not a nested column, but we can still do stuff
+ if (!parts.isEmpty()) {
+ // we are being asked for a path that will never exist, so we are null
selector
+ return DimensionSelector.constant(null, extractionFn);
+ }
Review Comment:
Is this because `parts.isEmpty()` means that we are looking for the root?
And if we are not looking for the root then it's impossible to have something?
If so, can we move the stuff that's after this if to be inside of the if (to
eliminate the negation) and adjust the comments to explain the relationship
between `parts.isEmpty()` and the "real world/query"?
##########
processing/src/test/resources/simple-nested-test-data.json:
##########
@@ -1,7 +1,7 @@
{"timestamp": "2021-01-01", "dim": "hello", "nest": {"x": 100, "y": 200, "z":
300}, "nester":{ "x": ["a", "b", "c"], "y": {"a": "a", "b": "b", "c": [1, 2,
3]}}, "variant": {"a": ["hello", "world"], "b": {"x": "hello", "y": "world"}},
"list":[{"x": 5, "y": 10}, {"x": 15, "y": 22}]}
{"timestamp": "2021-01-01", "dim": "hello", "nester":{ "x": ["x", "y", "z"]},
"list":[{"x": 35, "y": 310}, {"x": 315, "y": 322}]}
{"timestamp": "2021-01-01", "dim": "hello", "nest":{ "x": 300, "y": 800},
"nester": "hello"}
-{"timestamp": "2021-01-01", "dim": "hello", "nest":{ "y": 500}, "list":[{"x":
115, "y": 410}, {"x": 415, "y": 422}]}
+{"timestamp": "2021-01-01", "dim": "100", "nest":{ "y": 500}, "list":[{"x":
115, "y": 410}, {"x": 415, "y": 422}]}
Review Comment:
I noticed that it's still a string. Was this change also wanting to test
adding a new type and having it be a number? Or is this being used to validate
that the parsing of numbers works?
##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
ReadableOffset offset
)
{
- final NestedDataComplexColumn column =
NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
- if (column == null) {
- // complex column itself didn't exist
- return DimensionSelector.constant(null);
+ ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+ if (holder == null) {
+ // column doesn't exist
+ return dimensionSpec.decorate(DimensionSelector.constant(null));
}
if (hasNegativeArrayIndex) {
- return new FieldDimensionSelector(
- new RawFieldLiteralColumnValueSelector(
- column.makeColumnValueSelector(offset),
- parts
- )
- );
+ // if the path has negative array elements, then we have to use the
'raw' processing of the FieldDimensionSelector
+ // created with the column selector factory instead of using the
optimized nested field column, return null
+ // to fall through
+ return null;
+ }
+
+ return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder,
offset, dimensionSpec.getExtractionFn()));
Review Comment:
True statement, should be ignored in this case.
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteNestedDataQueryTest.java:
##########
@@ -174,7 +198,72 @@
.rows(ROWS)
.buildMMappedIndex();
- return new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+ final QueryableIndex indexMix11 =
+ IndexBuilder.create()
+ .tmpDir(temporaryFolder.newFolder())
+
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+ .schema(
+ new IncrementalIndexSchema.Builder()
+ .withMetrics(
+ new CountAggregatorFactory("cnt")
+ )
+ .withDimensionsSpec(PARSER)
Review Comment:
Maybe make this call with `PARSER.getParseSpec().getDimensionSpec()` instead
and avoid usage of the deprecated method?
##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@ public DimensionSelector makeDimensionSelector(
ReadableOffset offset
)
{
- final NestedDataComplexColumn column =
NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
- if (column == null) {
- // complex column itself didn't exist
- return DimensionSelector.constant(null);
+ ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+ if (holder == null) {
+ // column doesn't exist
+ return dimensionSpec.decorate(DimensionSelector.constant(null,
dimensionSpec.getExtractionFn()));
}
if (hasNegativeArrayIndex) {
- return new FieldDimensionSelector(
- new RawFieldLiteralColumnValueSelector(
- column.makeColumnValueSelector(offset),
- parts
- )
- );
+ // if the path has negative array elements, then we have to use the
'raw' processing of the FieldDimensionSelector
+ // created with the column selector factory instead of using the
optimized nested field column, return null
+ // to fall through
+ return null;
}
- return column.makeDimensionSelector(parts, offset,
dimensionSpec.getExtractionFn());
+
+ return dimensionSpec.decorate(makeDimensionSelectorUndecorated(holder,
offset, dimensionSpec.getExtractionFn()));
+ }
+
+ private DimensionSelector makeDimensionSelectorUndecorated(
+ ColumnHolder holder,
+ ReadableOffset offset,
+ @Nullable ExtractionFn extractionFn
+ )
+ {
+ BaseColumn theColumn = holder.getColumn();
+ if (theColumn instanceof NestedDataComplexColumn) {
+ final NestedDataComplexColumn column = (NestedDataComplexColumn)
theColumn;
+ return column.makeDimensionSelector(parts, offset, extractionFn);
+ }
+
+ // ok, so not a nested column, but we can still do stuff
+ if (!parts.isEmpty()) {
+ // we are being asked for a path that will never exist, so we are null
selector
+ return DimensionSelector.constant(null, extractionFn);
+ }
+
+ // the path was the 'root', we're in luck, spit out a selector that
behaves the same way as a nested column
Review Comment:
why does the `instanceof DictionaryEncoded` check tell us that the path was
the root? Or is that part intended to be explaining the if before this?
##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -258,18 +299,37 @@ public ColumnValueSelector<?> makeColumnValueSelector(
ReadableOffset offset
)
{
- final NestedDataComplexColumn column =
NestedDataComplexColumn.fromColumnSelector(columnSelector, this.columnName);
- if (column == null) {
+ ColumnHolder holder = columnSelector.getColumnHolder(this.columnName);
+ if (holder == null) {
return NilColumnValueSelector.instance();
}
+ BaseColumn theColumn = holder.getColumn();
- // processFromRaw is true, that means JSON_QUERY, which can return partial
results, otherwise this virtual column
- // is JSON_VALUE which only returns literals, so we can use the nested
columns value selector
- return processFromRaw
- ? new
RawFieldColumnSelector(column.makeColumnValueSelector(offset), parts)
- : hasNegativeArrayIndex
- ? new
RawFieldLiteralColumnValueSelector(column.makeColumnValueSelector(offset),
parts)
- : column.makeColumnValueSelector(parts, offset);
+ if (processFromRaw || hasNegativeArrayIndex) {
+ // if the path has negative array elements, or has set the flag to
process 'raw' values explicitly (JSON_QUERY),
+ // then we use the 'raw' processing of the
RawFieldColumnSelector/RawFieldLiteralColumnValueSelector created
+ // with the column selector factory instead of using the optimized
nested field column
+ return null;
+ }
Review Comment:
I understand why processFromRaw takes us into this if, I don't understand
why `hasNegativeArrayIndex` does?
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
);
if (metadata.hasNulls()) {
columnBuilder.setHasNulls(true);
- final ByteBuffer nullIndexBuffer = loadInternalFile(mapper,
NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+ final ByteBuffer nullIndexBuffer = loadInternalFile(
+ mapper,
+ metadata,
+ NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+ );
nullValues =
metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
} else {
nullValues =
metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
}
+
+ return new NestedDataColumnSupplier(
+ version,
+ metadata,
+ fields,
+ fieldInfo,
+ compressedRawColumnSupplier,
+ nullValues,
+ stringDictionary,
+ frontCodedStringDictionarySupplier,
+ longDictionarySupplier,
+ doubleDictionarySupplier,
+ columnConfig,
+ mapper,
+ simpleType
+ );
Review Comment:
When building this, it appears like we are loading various files to then use
in the suppliers. Given the recent sensitivity around the amount of data held
on JVM heap per segment, I'm curious if we are being as judicious as possible
with our JVM memory. We have to draw a fine line between performance and not
performing hard-work multiple times and JVM heap, 'cause we can't fit
everything in it.
With the changes here, I would have to dig in further to really get a sense
for how it is changing the JVM heap memory usage, so I'm just wondering what
your thought process was around these changes and their impact to heap memory?
##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/SegmentMetadataCache.java:
##########
@@ -798,7 +798,20 @@ DatasourceTable.PhysicalDatasourceMetadata
buildDruidTable(final String dataSour
rowSignature.getColumnType(column)
.orElseThrow(() -> new ISE("Encountered null type
for column [%s]", column));
- columnTypes.putIfAbsent(column, columnType);
+ columnTypes.compute(column, (c, existingType) -> {
+ if (existingType == null) {
+ return columnType;
+ }
+ if (columnType == null) {
+ return existingType;
+ }
+ // if any are json, are all json
+ if (NestedDataComplexTypeSerde.TYPE.equals(columnType) ||
NestedDataComplexTypeSerde.TYPE.equals(existingType)) {
+ return NestedDataComplexTypeSerde.TYPE;
+ }
+ // "existing type" is the 'newest' type, since we iterate the
segments list by newest start time
+ return existingType;
+ });
Review Comment:
This is a question not actually about your code, but the code that existed
before it... Given that this is using a compute to put values into the map,
how do these things change?
##########
processing/src/main/java/org/apache/druid/segment/virtual/NestedFieldVirtualColumn.java:
##########
@@ -233,20 +245,49 @@
ReadableOffset offset
)
{
- final NestedDataComplexColumn column =
NestedDataComplexColumn.fromColumnSelector(columnSelector, columnName);
- if (column == null) {
- // complex column itself didn't exist
- return DimensionSelector.constant(null);
+ ColumnHolder holder = columnSelector.getColumnHolder(columnName);
+ if (holder == null) {
+ // column doesn't exist
+ return dimensionSpec.decorate(DimensionSelector.constant(null,
dimensionSpec.getExtractionFn()));
Review Comment:
True statement, should be ignored in this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]