etseidl commented on PR #34054:
URL: https://github.com/apache/arrow/pull/34054#issuecomment-1426607039

   > Yes I have already noticed that a record may span across different pages. 
But in the parquet-cpp, the page size check always happens at the end of each 
batch. Therefore it guarantees that a page will not split any record. Please 
check this function as well as where it is called for reference: 
https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_writer.cc#L1376
   
   Perhaps I'm misunderstanding, but it appears that the function you 
referenced is called after a batch of values is written...I don't see where it 
is guaranteed that the end of a batch is also the end of a row.  But thanks for 
working on the page indexes, I think it's an important feature that arrow-cpp 
currently lacks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to