clintropolis commented on code in PR #13732:
URL: https://github.com/apache/druid/pull/13732#discussion_r1095431083


##########
processing/src/main/java/org/apache/druid/segment/nested/NestedDataColumnSupplier.java:
##########
@@ -168,22 +173,85 @@ public NestedDataColumnSupplier(
         );
         if (metadata.hasNulls()) {
           columnBuilder.setHasNulls(true);
-          final ByteBuffer nullIndexBuffer = loadInternalFile(mapper, 
NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME);
+          final ByteBuffer nullIndexBuffer = loadInternalFile(
+              mapper,
+              metadata,
+              NestedDataColumnSerializer.NULL_BITMAP_FILE_NAME
+          );
           nullValues = 
metadata.getBitmapSerdeFactory().getObjectStrategy().fromByteBufferWithSize(nullIndexBuffer);
         } else {
           nullValues = 
metadata.getBitmapSerdeFactory().getBitmapFactory().makeEmptyImmutableBitmap();
         }
+
+        return new NestedDataColumnSupplier(
+            version,
+            metadata,
+            fields,
+            fieldInfo,
+            compressedRawColumnSupplier,
+            nullValues,
+            stringDictionary,
+            frontCodedStringDictionarySupplier,
+            longDictionarySupplier,
+            doubleDictionarySupplier,
+            columnConfig,
+            mapper,
+            simpleType
+        );

Review Comment:
   i didn't actually change anything here, i just moved the logic out of the 
constructor into a static method per a request of yours on a previous PR. the 
nested column segments are pretty light due to the use of suppliers, 
particularly when using `FrontCodedIndexed` for the string dictionary 
https://github.com/apache/druid/pull/12277#discussion_r1004350233, so they 
consist mostly of a reference to the underlying buffer from the mmap segment 
and the supplier object until materialized into an indexed for queries. I did 
this with heap usage specifically in mind.
   
   `GenericIndexed` is the most expensive part of segment heap usage afaict, 
and the limited use of it here helps keep things light, because its both 
materialized into an `Indexed` and also still functions like a supplier with 
the 'singleThreaded' method, and usually is the driver of heap size in the 
dumps i've seen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to