willtemperley commented on code in PR #95:
URL: https://github.com/apache/arrow-swift/pull/95#discussion_r2428228920


##########
Arrow/Sources/Arrow/ArrowWriter.swift:
##########
@@ -138,18 +138,26 @@ public class ArrowWriter { // swiftlint:disable:this 
type_body_length
         var rbBlocks = [org_apache_arrow_flatbuf_Block]()
 
         for batch in batches {
+            addPadForAlignment(&writer)
             let startIndex = writer.count
             switch writeRecordBatch(batch: batch) {
             case .success(let rbResult):
                 withUnsafeBytes(of: CONTINUATIONMARKER.littleEndian) 
{writer.append(Data($0))}
                 withUnsafeBytes(of: rbResult.1.o.littleEndian) 
{writer.append(Data($0))}
                 writer.append(rbResult.0)
+                addPadForAlignment(&writer)
+                let metadataEnd = writer.count
+                let metadataLength = metadataEnd - startIndex

Review Comment:
   Yes exactly. What's strange however is that unless the 8 byte prefix:
   ```
   <continuation: 0xFFFFFFFF>
   <metadata_size: int32>
   ```
   isn't included in the metadata length, the file is invalid according to 
PyArrow.
   
   Doing a small experiment with the example in testFileWriter_bool, writing 
the block metadata like this:
   
   offset: 120
   metadataLength: 208
   bodyLength: 296
   
   PyArrow throws an error:
   pyarrow.lib.ArrowInvalid: flatbuffer size 8 invalid. File offset: 128, 
metadata length: 208
   
   However if the metadataLength includes the 8 byte prefix, i.e.:
   
   offset: 120
   metadataLength: 216
   bodyLength: 296
   
   The file is valid according to PyArrow.
   
   This could be a bug in C++ but either way it probably should be in the spec 
because must be data out in the wild with this structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to