Re: [DISCUSS] Splitting out the Arrow format directory

Antoine Pitrou Wed, 11 Aug 2021 14:34:51 -0700


Le 11/08/2021 à 23:06, Phillip Cloud a écrit :

On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou <anto...@python.org> wrote:

Le 11/08/2021 à 22:16, Phillip Cloud a écrit :


Yeah, that is a drawback here, though I don't see needing to run flatc

as a

major downside given the upside
of not having to write additional code to move between formats.


That's only an advantage if you already know how to read the Arrow IPC
format (and, yes, in this case you already run `flatc`).  Some projects
probably don't care about Arrow IPC (Dask, for example).



I don't think it's about the IPC though, at least for the compute IR use
case.
Am I missing something there?

If you're not handling the Arrow IPC format, then you probably don'thave an encoder/decoder for Schema.fbs, so the "upside of not having towrite additional code to move between formats" doesn't exist (unless I'mmisunderstanding your point?).

I do think a downside of not using something like JSON or msgpack is
that schema validation must be implemented by both the producer and the
consumer.
That means we'd have a number of other consequential decisions to make:

* Do we provide the validation library?
* If not, do all the languages arrow supports have high quality libraries
for validating schemas?
* If so, then we have to implement/maintain/release/bugfix that.

This is true. However, Flatbuffers doesn't validate much on its own,either, because its IDL is not expressive enough. For example,`Schema.fbs` allows you to declare a INT8 field with children, a LISTfield without any children, a non-nullable NULL field...


(also, there's JSON Schema: https://json-schema.org/)

Regards

Antoine.

Re: [DISCUSS] Splitting out the Arrow format directory

Reply via email to