willtemperley commented on code in PR #95:
URL: https://github.com/apache/arrow-swift/pull/95#discussion_r2428239254
##########
Arrow/Sources/Arrow/ArrowWriter.swift:
##########
@@ -138,18 +138,26 @@ public class ArrowWriter { // swiftlint:disable:this
type_body_length
var rbBlocks = [org_apache_arrow_flatbuf_Block]()
for batch in batches {
+ addPadForAlignment(&writer)
let startIndex = writer.count
switch writeRecordBatch(batch: batch) {
case .success(let rbResult):
withUnsafeBytes(of: CONTINUATIONMARKER.littleEndian)
{writer.append(Data($0))}
withUnsafeBytes(of: rbResult.1.o.littleEndian)
{writer.append(Data($0))}
writer.append(rbResult.0)
+ addPadForAlignment(&writer)
+ let metadataEnd = writer.count
+ let metadataLength = metadataEnd - startIndex
Review Comment:
Yes exactly. What's strange however is that unless the 8 byte prefix, i.e.:
```
<continuation: 0xFFFFFFFF>
<metadata_size: int32>
```
is included in the metadata length, the file is invalid according to PyArrow.
Doing a small experiment with the example in testFileWriter_bool, writing
the block metadata like this, without the 8 byte prefix in the metadataLength:
offset: 120
metadataLength: 208
bodyLength: 296
PyArrow throws an error:
pyarrow.lib.ArrowInvalid: flatbuffer size 8 invalid. File offset: 128,
metadata length: 208
However if the metadataLength includes the 8 byte prefix, i.e.:
offset: 120
metadataLength: 216
bodyLength: 296
The file is valid according to PyArrow. I think this might be missing from
the spec. Or maybe a bug in the C++ implementation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]