On Fri, Aug 13, 2021 at 11:43 AM Antoine Pitrou <anto...@python.org> wrote:
> > Le 13/08/2021 à 17:35, Phillip Cloud a écrit : > > > >> I.e. make the ability to read and write by humans be more important than > >> speed of validation. > > > > I think I differ on whether the IR should be easy to read and write by > > humans. > > IR is going to be predominantly read and written by machines, though of > > course > > we will need a way to inspect it for debugging. > > But the code executed by machines is written by humans. I think that's > mostly where the contention resides: is it easy to code, in any given > language, the routines required to produce or consume the IR? > Definitely not for flatbuffers, since flatbuffers is IMO annoying to use in any language except C++, and it's borderline annoying there too. Protobuf is similar (less annoying in Rust, but still annoying in Python and C++ IMO), though I think any binary format is going to be less human-friendly, by construction. If we were to use something like JSON or msgpack, can someone sketch out the interaction between the IR and the rest of arrow's type system? Would we need a JSON-encoded-arrow-type -> in-memory representation for an Arrow type in a given language? I just thought of one other requirement: the format needs to support arbitrary byte sequences. JSON doesn't support untransformed byte sequences, though it's not uncommon to base64-encode a byte sequence. IMO that adds an unnecessary layer of complexity, which is another tradeoff to consider.