wgtmac commented on PR #34054:
URL: https://github.com/apache/arrow/pull/34054#issuecomment-1426803534

   > > Please correct me if I am wrong. At least the arrow parquet writer 
guarantees this by calling ColumnWriter::WriteArrow like this: 
https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L154.
 Yes, the ParquetFileWriter itself does not prevent this.
   > 
   > I think @etseidl is correct here, WriteArrow is working on leaf arrays and 
IIRC rep/def levels in the code references are the only way to recover record 
boundaries. Sorry its been a busy week will aim to catchup on reviews next 
week. It would also be nice to not special case this for Arrow even it does 
somehow work there.
   
   @emkornfield Thanks for the explanation! No problem and this is not ready to 
review due to a series of blocking issues ahead.
   
   I strongly agree that writing via arrow should not be a special case. It 
sounds like splitting page at record boundary is a new blocking issue now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to