Hello there, I've written something that behaves similarly to:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L73

Except that for proof of concept purposes, it transforms Java objects with data 
into a byte[] payload.  The ArrowFileWriter log statements indicate that data 
is getting written to the output stream:

17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 6
17:53:16.759 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 2
17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 4
17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 288
17:53:16.766 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 4
17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch - 
Buffer in RecordBatch at 0, length: 1
17:53:16.769 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch - 
Buffer in RecordBatch at 8, length: 24
17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch - 
Buffer in RecordBatch at 32, length: 1
17:53:16.770 [main] DEBUG org.apache.arrow.vector.schema.ArrowRecordBatch - 
Buffer in RecordBatch at 40, length: 12
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 4
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 216
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 4
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 1
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 7
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 24
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 1
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 7
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 12
17:53:16.771 [main] DEBUG org.apache.arrow.vector.file.WriteChannel - Writing 
buffer with size: 4
17:53:16.772 [main] DEBUG org.apache.arrow.vector.file.ArrowWriter - 
RecordBatch at 304, metadata: 224, body: 56


However, when I wrap that payload into a ByteArrayReadableSeekableByteChannel 
and use ArrowFileReader (along with a BufferAllocator) to read it, 
ArrowFileReader is complaining that it's reading an invalid format, right at 
the point where I call reader.getVectorSchemaRoot():

Exception in thread "main" 
org.apache.arrow.vector.file.InvalidArrowFileException: missing Magic number 
[0, 0, 42, 0, 0, 0, 0, 0, 0, 0]
     at 
org.apache.arrow.vector.file.ArrowFileReader.readSchema(ArrowFileReader.java:66)
  at 
org.apache.arrow.vector.file.ArrowFileReader.readSchema(ArrowFileReader.java:37)
  at org.apache.arrow.vector.file.ArrowReader.initialize(ArrowReader.java:162)
 at 
org.apache.arrow.vector.file.ArrowReader.ensureInitialized(ArrowReader.java:153)
  at 
org.apache.arrow.vector.file.ArrowReader.getVectorSchemaRoot(ArrowReader.java:67)
 at 
com.bloomberg.andrew.sql.execution.arrow.ArrowConverters.byteArrayToBatch(ArrowConverters.java:89)
        at 
com.bloomberg.andrew.sql.execution.arrow.ArrowPayload.loadBatch(ArrowPayload.java:18)
     at 
com.bloomberg.andrew.test.arrow.ArrowPublisher.main(ArrowPublisher.java:28)


I'm noticing that the number 42 is exactly the same as the value of the very 
last field/member in the very last object in our list (or equivalent, the very 
last column of the very last row of our table), and if I try out a bunch of 
different cases, this appears to be the case.  Clearly, I'm writing stuff to 
the output stream...but any ideas as to why ArrowReader is struggling?  There's 
some ideas regarding big endian/little endian stuff, but I'm not sure if that 
was addressed or not.  Thanks!

Reply via email to