Re: [PR] Various fixes for large columns. (druid)

via GitHub Sun, 02 Feb 2025 17:45:32 -0800


cryptoe commented on code in PR #17691:
URL: https://github.com/apache/druid/pull/17691#discussion_r1938661068



##########
processing/src/main/java/org/apache/druid/segment/data/V3CompressedVSizeColumnarMultiIntsSerializer.java:
##########
@@ -35,33 +35,47 @@ public class V3CompressedVSizeColumnarMultiIntsSerializer 
extends ColumnarMultiI
 {
   private static final byte VERSION = 
V3CompressedVSizeColumnarMultiIntsSupplier.VERSION;
 
+  /**
+   * Creates a new serializer.
+   *
+   * @param columnName            name of the column to write
+   * @param segmentWriteOutMedium supplier of temporary files
+   * @param filenameBase          base filename to be used for secondary 
files, if multiple files are needed
+   * @param maxValue              maximum integer value that will be written 
to the column
+   * @param compression           compression strategy to apply
+   * @param fileSizeLimit         limit for files created by the writer. In 
production code, this should always be
+   *                              {@link GenericIndexedWriter#MAX_FILE_SIZE}. 
The parameter is exposed only for testing.
+   */
   public static V3CompressedVSizeColumnarMultiIntsSerializer create(
       final String columnName,
       final SegmentWriteOutMedium segmentWriteOutMedium,
       final String filenameBase,
       final int maxValue,
-      final CompressionStrategy compression
+      final CompressionStrategy compression,
+      final int fileSizeLimit
   )
   {
     return new V3CompressedVSizeColumnarMultiIntsSerializer(
         columnName,
         new CompressedColumnarIntsSerializer(
             columnName,
             segmentWriteOutMedium,
-            filenameBase,
+            filenameBase + ".offsets",

Review Comment:
   Could you please explain why the `offsets` and `value` need to be added. Is 
it `QOL` improvements and not directly related to this patch ?
   How does backward compatibility look post this change, ie can older segment 
reading code still read these files ?



##########
processing/src/main/java/org/apache/druid/segment/data/GenericIndexedWriter.java:
##########
@@ -72,18 +73,31 @@ public class GenericIndexedWriter<T> implements 
DictionaryWriter<T>
       .writeByteArray(x -> x.fileNameByteArray);
 
 
+  /**
+   * Creates a new writer that accepts byte buffers and compresses them.
+   *
+   * @param segmentWriteOutMedium supplier of temporary files
+   * @param filenameBase          base filename to be used for secondary 
files, if multiple files are needed
+   * @param compressionStrategy   compression strategy to apply
+   * @param bufferSize            size of the buffers that will be passed in
+   * @param fileSizeLimit         limit for files created by the writer. In 
production code, this should always be
+   *                              {@link GenericIndexedWriter#MAX_FILE_SIZE}. 
The parameter is exposed only for testing.
+   * @param closer                closer to attach temporary compression 
buffers to
+   */
   public static GenericIndexedWriter<ByteBuffer> ofCompressedByteBuffers(
       final SegmentWriteOutMedium segmentWriteOutMedium,
       final String filenameBase,
       final CompressionStrategy compressionStrategy,
       final int bufferSize,
+      final int fileSizeLimit,
       final Closer closer
   )
   {
     GenericIndexedWriter<ByteBuffer> writer = new GenericIndexedWriter<>(
         segmentWriteOutMedium,
         filenameBase,
-        compressedByteBuffersWriteObjectStrategy(compressionStrategy, 
bufferSize, closer)
+        compressedByteBuffersWriteObjectStrategy(compressionStrategy, 
bufferSize, closer),
+        fileSizeLimit

Review Comment:
   @gianm It looks like this is the crux of the change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Various fixes for large columns. (druid)

Reply via email to