joellubi commented on PR #43066:
URL: https://github.com/apache/arrow/pull/43066#issuecomment-2211258792

   > @joellubi For arrow C++ usecase, encode is implemented like this patch 
now, decode, however is implemented by batch:
   
   Thanks @mapleFU. I think it would be nice to keep the behavior aligned but 
there is a slight difference between how Go and cpp implementations batch reads.
   
   In cpp, the 
[ReadValues](https://github.com/apache/arrow/blob/5b5c164a6a467af2803e927b2de1b9b6ee5de895/cpp/src/parquet/column_reader.cc#L664-L671)
 method reads "up to batch_size values from the current data page".
   
   In Go, the 
[readBatch](https://github.com/apache/arrow/blob/5b5c164a6a467af2803e927b2de1b9b6ee5de895/go/parquet/file/column_reader.go#L487-L527)
 method "will read until it either reads in batchSize values or it hits the end 
of the column chunk, including reading multiple pages".
   
   Since all values must be decoded within the window of a single page, it's 
safe to decode the page when `SetData` is called in Go but an entire batch in 
general may span multiple pages. In cpp the values read in a single batch is 
limited to the values left in the current page, so it's safe to read in 
separate batches without crossing a page boundary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to