Re: [PR] feat(parquet): utilize memory allocator in `serializedPageReader` [arrow-go]

via GitHub Fri, 29 Aug 2025 11:33:47 -0700


zeroshade commented on code in PR #485:
URL: https://github.com/apache/arrow-go/pull/485#discussion_r2310553142



##########
parquet/file/column_reader.go:
##########
@@ -437,16 +446,27 @@ func (c *columnChunkReader) initDataDecoder(page Page, 
lvlByteLen int64) error {
                        format.Encoding_DELTA_LENGTH_BYTE_ARRAY,
                        format.Encoding_DELTA_BINARY_PACKED,
                        format.Encoding_BYTE_STREAM_SPLIT:
-                       c.curDecoder = 
c.decoderTraits.Decoder(parquet.Encoding(encoding), c.descr, false, c.mem)
-                       c.decoders[encoding] = c.curDecoder
+                       c.curDecoder = 
c.decoderTraits.Decoder(parquet.Encoding(enc), c.descr, false, c.mem)
+                       c.decoders[enc] = c.curDecoder
                case format.Encoding_RLE_DICTIONARY:
                        return errors.New("parquet: dictionary page must be 
before data page")
                default:
-                       return fmt.Errorf("parquet: unknown encoding type %s", 
encoding)
+                       return fmt.Errorf("parquet: unknown encoding type %s", 
enc)
+               }
+       }
+
+       switch c.descr.PhysicalType() {
+       case parquet.Types.FixedLenByteArray:
+               c.curDecoder = &encoding.FixedLenByteArrayDecoderWrapper{
+                       FixedLenByteArrayDecoder: 
c.curDecoder.(encoding.FixedLenByteArrayDecoder),
+               }
+       case parquet.Types.ByteArray:
+               c.curDecoder = &encoding.ByteArrayDecoderWrapper{
+                       ByteArrayDecoder: 
c.curDecoder.(encoding.ByteArrayDecoder),

Review Comment:
   Yes, but remember that the code you're changing called `     buf := 
memory.NewResizableBuffer(p.mem)` which does allocate a new buffer that the 
data is copied into before the changes you're making. The changes in this PR 
are introducing buffer re-use that wasn't there before. 
   
   I'm not saying it's a bad thing, just pointing out part of the reason for 
the previous behavior that allocated new buffers was to prevent the problem 
you've identified. Now that we're reusing these buffers in this change, we have 
to do the copying you're proposing.
   
   Thats all



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] feat(parquet): utilize memory allocator in `serializedPageReader` [arrow-go]

Reply via email to