joechenrh commented on code in PR #485: URL: https://github.com/apache/arrow-go/pull/485#discussion_r2310514899
########## parquet/file/column_reader.go: ########## @@ -437,16 +446,27 @@ func (c *columnChunkReader) initDataDecoder(page Page, lvlByteLen int64) error { format.Encoding_DELTA_LENGTH_BYTE_ARRAY, format.Encoding_DELTA_BINARY_PACKED, format.Encoding_BYTE_STREAM_SPLIT: - c.curDecoder = c.decoderTraits.Decoder(parquet.Encoding(encoding), c.descr, false, c.mem) - c.decoders[encoding] = c.curDecoder + c.curDecoder = c.decoderTraits.Decoder(parquet.Encoding(enc), c.descr, false, c.mem) + c.decoders[enc] = c.curDecoder case format.Encoding_RLE_DICTIONARY: return errors.New("parquet: dictionary page must be before data page") default: - return fmt.Errorf("parquet: unknown encoding type %s", encoding) + return fmt.Errorf("parquet: unknown encoding type %s", enc) + } + } + + switch c.descr.PhysicalType() { + case parquet.Types.FixedLenByteArray: + c.curDecoder = &encoding.FixedLenByteArrayDecoderWrapper{ + FixedLenByteArrayDecoder: c.curDecoder.(encoding.FixedLenByteArrayDecoder), + } + case parquet.Types.ByteArray: + c.curDecoder = &encoding.ByteArrayDecoderWrapper{ + ByteArrayDecoder: c.curDecoder.(encoding.ByteArrayDecoder), Review Comment: I didn't meet this situation, but I think it could happen. Just like the above example: - Get buffer1 -> Read data from page 1 -> Release buffer1 - Get buffer2 -> Read data from page 2 -> Release buffer2 - Get buffer 1 again Since the `values` might directly point to these buffers, they could be modified when reading subsequent pages https://github.com/apache/arrow-go/blob/c6ce2ef4e55009a786cf04b3845eba5170c98066/parquet/file/column_reader_types.gen.go#L216-L220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org