daniel-adam-tfs commented on code in PR #547:
URL: https://github.com/apache/arrow-go/pull/547#discussion_r2478573444
##########
parquet/file/page_reader.go:
##########
@@ -500,14 +508,31 @@ func (p *serializedPageReader) Page() Page {
return p.curPage
}
+func (p *serializedPageReader) readUncompressed(rd io.Reader, lenUncompressed
int, buf []byte) ([]byte, error) {
+ n, err := io.ReadFull(rd, buf[:lenUncompressed])
+ if err != nil {
+ return nil, err
+ }
+ if n != lenUncompressed {
+ return nil, fmt.Errorf("parquet: expected to read %d bytes but
only read %d", lenUncompressed, n)
+ }
+ if p.cryptoCtx.DataDecryptor != nil {
Review Comment:
Alright, so I "steal" the buffer by using `Peek`/`Discard` if the data has
been read previously and it is available of the `BufferedReader`. So in the
uncompressed and unencrypted case -> data is read and stored into a buffer in
`ReaderProperties.GetStream` and copied to the user provided buffer to
`Float32ColumnChunkReader.ReadBatch`.
Now, if we have a plainEncoder and no compression, it should be possible to
write the data directly to the user provided buffer, so that would eliminate
even that copy, but one is more complicated and I need to be start doing other
stuff. :D
Also, the decryption types allocate buffers for the decrypted data. We could
send it an already allocated buffer to use, or maybe do an in place decryption
(if possible), or give it the custom allocator if it is set.
Anyway, I'll fix the decryption for DataPageV2 next and I'll consider this
one done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]