wgtmac commented on PR #34054: URL: https://github.com/apache/arrow/pull/34054#issuecomment-1426803534
> > Please correct me if I am wrong. At least the arrow parquet writer guarantees this by calling ColumnWriter::WriteArrow like this: https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L154. Yes, the ParquetFileWriter itself does not prevent this. > > I think @etseidl is correct here, WriteArrow is working on leaf arrays and IIRC rep/def levels in the code references are the only way to recover record boundaries. Sorry its been a busy week will aim to catchup on reviews next week. It would also be nice to not special case this for Arrow even it does somehow work there. @emkornfield Thanks for the explanation! No problem and this is not ready to review due to a series of blocking issues ahead. I strongly agree that writing via arrow should not be a special case. It sounds like splitting page at record boundary is a new blocking issue now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
