Re: Help Understanding Streaming Format

Expanding Man Mon, 18 Mar 2019 19:26:49 -0700

Hm, that is puzzling, it makes it seem as if there is something wrong with the 
Julia FlatBuffers package, as it is definitely reading a message (seemingly 
properly) for me at byte 20 and not byte 4 as expected.


In regard to moving my project to the mono-repo: I have only just recently 
undertaken completing my package so that it implements the full standard along 
with IPC.  Once that is complete, I expect to have something that is more 
suitable for integration into the mono-repo.  When the time comes, I'd be happy 
to facilitate that.

This particular issue has piqued concerns I already had about the maturity of 
the FlatBuffers.jl package.  It is starting to look like I'll have to do at 
least a little bit of work on that package as well, let's see how involved it 
is.

I will make a post back in the original issue so that this information can be 
found there as well.

Thanks for your help!




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, March 18, 2019 10:11 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi,
>
> I added some print statements to illustrate the flow of parsing the
> stream in the example you gave
>
> $ python test.py
> File is at offset: 0
> Message length: 140
> About to read body, file at offset: 144
> Read message body, file at offset: 144
> Opening a Message flatbuffer with size 140
> File is at offset: 144
> Message length: 140
> About to read body, file at offset: 288
> Read message body, file at offset: 320
> Opening a Message flatbuffer with size 140
> File is at offset: 320
>
> So it seems the Flatbuffers library recognizes bytes 4 through 144 as a 
> Message
>
> I put my branch here:
> https://github.com/wesm/arrow/tree/ipc-debug-print-20190318
>
> The test.py is here
> https://gist.github.com/wesm/dd40aa3196cd138e883d94c574d154f9
>
> BTW can you comment on
> https://github.com/ExpandingMan/Arrow.jl/issues/28? I would like to
> see a Julia implementation inside the Apache Arrow project.
>
> Thanks
>
> Wes
>
> On Mon, Mar 18, 2019 at 7:58 PM Expanding Man
> expanding...@protonmail.com.invalid wrote:
>
> > Hello all, I am working on a pure Julia implementation of the arrow 
> > standard. Currently I am working on ingesting the metadata, and it seems to 
> > me that the output I'm creating with `pyarrow` is not matching the format, 
> > so I'm trying to figure out where I've misunderstood it.
> > I've written some arrow data to disk with the code you can find in this 
> > gist.
> > Reading the format, I expect each message to start with an `Int32` giving 
> > the size of the metadata flatbuffers, followed by the metadata flatbuffers 
> > themselves. The `Int32`'s indeed seem to be there, however the `Message` 
> > flatbuffers do not start where I expect. On the output from above, I find 
> > the first flatbuffers containing the `Message` with the `Schema` at byte 
> > 20. I am successfully able to construct all flatbuffer objects in Julia 
> > from byte 20, but I was expecting to find this flatbuffer at byte 4 
> > immediately following the `Int32`. What is contained in bytes 4 to 19?
> > Similarly, I can find the next `Int32` at byte 144 as expected, however I 
> > can't find the flatbuffers after that until byte 168. Again, I can 
> > successfully construct the metadata flatbuffers (in this case a `Message` 
> > containing a `RecordBatch`) in Julia, but I was expecting to do this from 
> > byte 148, not byte 168. What is contained in bytes 144 to 168? Note that 
> > this is now a 24 byte boundary, where as for the first `Message` it was 
> > only 16.
> > What am I missing here? I have a suspicion that there is a small flatbuffer 
> > of some sort being contained in the mysterious extra bytes, but the format 
> > description makes no mention of that.
> > Thanks!

Re: Help Understanding Streaming Format

Reply via email to