clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095431083
##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
);
if (metadata.hasNulls()) {
columnBuilder.setHasNulls(true);
- final ByteBuffer nullIndexBuffer = loadInternalFile(mapper,
NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+ final ByteBuffer nullIndexBuffer = loadInternalFile(
+ mapper,
+ metadata,
+ NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+ );
nullValues =
metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
} else {
nullValues =
metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
}
+
+ return new NestedDataColumnSupplier(
+ version,
+ metadata,
+ fields,
+ fieldInfo,
+ compressedRawColumnSupplier,
+ nullValues,
+ stringDictionary,
+ frontCodedStringDictionarySupplier,
+ longDictionarySupplier,
+ doubleDictionarySupplier,
+ columnConfig,
+ mapper,
+ simpleType
+ );
Review Comment:
i didn't actually change anything here, i just moved the logic out of the
constructor into a static method per a request of yours on a previous PR. the
nested column segments are pretty light due to the use of suppliers,
particularly when using `FrontCodedIndexed` for the string dictionary
https://github.com/apache/druid/pull/12277#discussion_r1004350233, so they
consist mostly of a reference to the underlying buffer from the mmap segment
and the supplier object until materialized into an indexed for queries. I did
this with heap usage specifically in mind.
`GenericIndexed` is the most expensive part of segment heap usage afaict,
and the limited use of it here helps keep things light, because its both
materialized into an `Indexed` and also still functions like a supplier with
the 'singleThreaded' method, and usually is the driver of heap size in the
dumps i've seen.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]