> > I just thought of one other requirement: the format needs to support > arbitrary byte sequences. > Can you clarify why this is needed? Is it that custom_metadata maps should allow byte sequences as values?
On Fri, Aug 13, 2021 at 10:00 AM Phillip Cloud <cpcl...@gmail.com> wrote: > On Fri, Aug 13, 2021 at 11:43 AM Antoine Pitrou <anto...@python.org> > wrote: > > > > > Le 13/08/2021 à 17:35, Phillip Cloud a écrit : > > > > > >> I.e. make the ability to read and write by humans be more important > than > > >> speed of validation. > > > > > > I think I differ on whether the IR should be easy to read and write by > > > humans. > > > IR is going to be predominantly read and written by machines, though of > > > course > > > we will need a way to inspect it for debugging. > > > > But the code executed by machines is written by humans. I think that's > > mostly where the contention resides: is it easy to code, in any given > > language, the routines required to produce or consume the IR? > > > > Definitely not for flatbuffers, since flatbuffers is IMO annoying to use in > any language except C++, > and it's borderline annoying there too. Protobuf is similar (less annoying > in Rust, > but still annoying in Python and C++ IMO), though I think any binary format > is going to be > less human-friendly, by construction. > > If we were to use something like JSON or msgpack, can someone sketch out > the interaction > between the IR and the rest of arrow's type system? > > Would we need a JSON-encoded-arrow-type -> in-memory representation for an > Arrow type in a given language? > > I just thought of one other requirement: the format needs to support > arbitrary byte sequences. JSON > doesn't support untransformed byte sequences, though it's not uncommon to > base64-encode a byte sequence. > IMO that adds an unnecessary layer of complexity, which is another tradeoff > to consider. >