joechenrh commented on code in PR #485:
URL: https://github.com/apache/arrow-go/pull/485#discussion_r2310514899
##########
parquet/file/column_reader.go:
##########
@@ -437,16 +446,27 @@ func (c *columnChunkReader) initDataDecoder(page Page,
lvlByteLen int64) error {
format.Encoding_DELTA_LENGTH_BYTE_ARRAY,
format.Encoding_DELTA_BINARY_PACKED,
format.Encoding_BYTE_STREAM_SPLIT:
- c.curDecoder =
c.decoderTraits.Decoder(parquet.Encoding(encoding), c.descr, false, c.mem)
- c.decoders[encoding] = c.curDecoder
+ c.curDecoder =
c.decoderTraits.Decoder(parquet.Encoding(enc), c.descr, false, c.mem)
+ c.decoders[enc] = c.curDecoder
case format.Encoding_RLE_DICTIONARY:
return errors.New("parquet: dictionary page must be
before data page")
default:
- return fmt.Errorf("parquet: unknown encoding type %s",
encoding)
+ return fmt.Errorf("parquet: unknown encoding type %s",
enc)
+ }
+ }
+
+ switch c.descr.PhysicalType() {
+ case parquet.Types.FixedLenByteArray:
+ c.curDecoder = &encoding.FixedLenByteArrayDecoderWrapper{
+ FixedLenByteArrayDecoder:
c.curDecoder.(encoding.FixedLenByteArrayDecoder),
+ }
+ case parquet.Types.ByteArray:
+ c.curDecoder = &encoding.ByteArrayDecoderWrapper{
+ ByteArrayDecoder:
c.curDecoder.(encoding.ByteArrayDecoder),
Review Comment:
I didn't meet this situation, but I think it could happen. Just like the
above example:
- Get buffer1 -> Read data from page 1 -> Release buffer1
- Get buffer2 -> Read data from page 2 -> Release buffer2
- Get buffer 1 again
Since the `values` might directly point to these buffers, they could be
modified when reading subsequent pages
https://github.com/apache/arrow-go/blob/c6ce2ef4e55009a786cf04b3845eba5170c98066/parquet/file/column_reader_types.gen.go#L216-L220
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]