bkietz opened a new issue, #6311: URL: https://github.com/apache/arrow-rs/issues/6311
**Describe the bug** <!-- A clear and concise description of what the bug is. --> The format guarantees that each IPC file [embeds a valid IPC stream](https://github.com/apache/arrow/blob/bde6ac57ad943f5938506359de1b13fb85b4f8ea/docs/source/format/Columnar.rst) in order to allow readers to ignore the Footer, skip the file's leading magic, and reuse a stream reader. However, when writing IPC files arrow-rs aligns the encapsulated flatbuffers Messages to 64 byte boundaries instead of 8 bytes. This can leave gaps of padding bytes between the Messages which a stream reader would not know to skip. **To Reproduce** <!-- Steps to reproduce the behavior: --> https://github.com/apache/arrow/pull/43834 adds validation of the embedded stream to arrow-c++'s `arrow-json-integration-test`. Running this against IPC files written by arrow-rs [raises an error](https://github.com/apache/arrow/actions/runs/10565413633/job/29269867382?pr=43834#step:9:27320) ```shell-session $ arrow-json-integration-test -arrow datetime.arrow -json datetime.json -integration -mode VALIDATE Error message: Invalid: Tried reading schema message, was null or length 0 ``` **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> A stream reader should be able to read an IPC file by skipping the first 8 bytes. **Additional context** <!-- Add any other context about the problem here. --> This was originally introduced in https://github.com/apache/arrow-rs/commit/eddef43d1cb46c1287da187ea1d86b0e1dc35a13 which added alignment to address new requirements around `i128`. However the alignment should not be applied to flatbuffers Messages; apart from the above issue I think there's no SIMD or other advantage to aligning those to more than 8 bytes. Body buffers can of course still be padded and aligned freely. This was discovered while adding IPC file reading to nanoarrow; we were trying to defer reading Footers for a follow up and discovered that the go, rust, and javascript implementations don't embed a valid stream. Most readers have not noticed because offsets and schemas are more efficiently read from a Footer, and once acquired obviate sequential stream-style reading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
