willtemperley commented on code in PR #95:
URL: https://github.com/apache/arrow-swift/pull/95#discussion_r2428228920
##########
Arrow/Sources/Arrow/ArrowWriter.swift:
##########
@@ -138,18 +138,26 @@ public class ArrowWriter { // swiftlint:disable:this
type_body_length
var rbBlocks = [org_apache_arrow_flatbuf_Block]()
for batch in batches {
+ addPadForAlignment(&writer)
let startIndex = writer.count
switch writeRecordBatch(batch: batch) {
case .success(let rbResult):
withUnsafeBytes(of: CONTINUATIONMARKER.littleEndian)
{writer.append(Data($0))}
withUnsafeBytes(of: rbResult.1.o.littleEndian)
{writer.append(Data($0))}
writer.append(rbResult.0)
+ addPadForAlignment(&writer)
+ let metadataEnd = writer.count
+ let metadataLength = metadataEnd - startIndex
Review Comment:
Yes exactly. What's strange however is that unless the 8 byte prefix:
```
<continuation: 0xFFFFFFFF>
<metadata_size: int32>
```
isn't included in the metadata length, the file is invalid according to
PyArrow.
Doing a small experiment with the example in testFileWriter_bool, writing
the block metadata like this:
offset: 120
metadataLength: 208
bodyLength: 296
PyArrow throws an error:
pyarrow.lib.ArrowInvalid: flatbuffer size 8 invalid. File offset: 128,
metadata length: 208
However if the metadataLength includes the 8 byte prefix, i.e.:
offset: 120
metadataLength: 216
bodyLength: 296
The file is valid according to PyArrow.
This could be a bug in C++ but either way it probably should be in the spec
because must be data out in the wild with this structure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]