Hm, that is puzzling, it makes it seem as if there is something wrong with the Julia FlatBuffers package, as it is definitely reading a message (seemingly properly) for me at byte 20 and not byte 4 as expected.
In regard to moving my project to the mono-repo: I have only just recently undertaken completing my package so that it implements the full standard along with IPC. Once that is complete, I expect to have something that is more suitable for integration into the mono-repo. When the time comes, I'd be happy to facilitate that. This particular issue has piqued concerns I already had about the maturity of the FlatBuffers.jl package. It is starting to look like I'll have to do at least a little bit of work on that package as well, let's see how involved it is. I will make a post back in the original issue so that this information can be found there as well. Thanks for your help! ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, March 18, 2019 10:11 PM, Wes McKinney <wesmck...@gmail.com> wrote: > hi, > > I added some print statements to illustrate the flow of parsing the > stream in the example you gave > > $ python test.py > File is at offset: 0 > Message length: 140 > About to read body, file at offset: 144 > Read message body, file at offset: 144 > Opening a Message flatbuffer with size 140 > File is at offset: 144 > Message length: 140 > About to read body, file at offset: 288 > Read message body, file at offset: 320 > Opening a Message flatbuffer with size 140 > File is at offset: 320 > > So it seems the Flatbuffers library recognizes bytes 4 through 144 as a > Message > > I put my branch here: > https://github.com/wesm/arrow/tree/ipc-debug-print-20190318 > > The test.py is here > https://gist.github.com/wesm/dd40aa3196cd138e883d94c574d154f9 > > BTW can you comment on > https://github.com/ExpandingMan/Arrow.jl/issues/28? I would like to > see a Julia implementation inside the Apache Arrow project. > > Thanks > > Wes > > On Mon, Mar 18, 2019 at 7:58 PM Expanding Man > expanding...@protonmail.com.invalid wrote: > > > Hello all, I am working on a pure Julia implementation of the arrow > > standard. Currently I am working on ingesting the metadata, and it seems to > > me that the output I'm creating with `pyarrow` is not matching the format, > > so I'm trying to figure out where I've misunderstood it. > > I've written some arrow data to disk with the code you can find in this > > gist. > > Reading the format, I expect each message to start with an `Int32` giving > > the size of the metadata flatbuffers, followed by the metadata flatbuffers > > themselves. The `Int32`'s indeed seem to be there, however the `Message` > > flatbuffers do not start where I expect. On the output from above, I find > > the first flatbuffers containing the `Message` with the `Schema` at byte > > 20. I am successfully able to construct all flatbuffer objects in Julia > > from byte 20, but I was expecting to find this flatbuffer at byte 4 > > immediately following the `Int32`. What is contained in bytes 4 to 19? > > Similarly, I can find the next `Int32` at byte 144 as expected, however I > > can't find the flatbuffers after that until byte 168. Again, I can > > successfully construct the metadata flatbuffers (in this case a `Message` > > containing a `RecordBatch`) in Julia, but I was expecting to do this from > > byte 148, not byte 168. What is contained in bytes 144 to 168? Note that > > this is now a 24 byte boundary, where as for the first `Message` it was > > only 16. > > What am I missing here? I have a suspicion that there is a small flatbuffer > > of some sort being contained in the mysterious extra bytes, but the format > > description makes no mention of that. > > Thanks!