Le 11/08/2021 à 23:06, Phillip Cloud a écrit :
On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou <anto...@python.org> wrote:
Le 11/08/2021 à 22:16, Phillip Cloud a écrit :
Yeah, that is a drawback here, though I don't see needing to run flatc
as a
major downside given the upside
of not having to write additional code to move between formats.
That's only an advantage if you already know how to read the Arrow IPC
format (and, yes, in this case you already run `flatc`). Some projects
probably don't care about Arrow IPC (Dask, for example).
I don't think it's about the IPC though, at least for the compute IR use
case.
Am I missing something there?
If you're not handling the Arrow IPC format, then you probably don't
have an encoder/decoder for Schema.fbs, so the "upside of not having to
write additional code to move between formats" doesn't exist (unless I'm
misunderstanding your point?).
I do think a downside of not using something like JSON or msgpack is
that schema validation must be implemented by both the producer and the
consumer.
That means we'd have a number of other consequential decisions to make:
* Do we provide the validation library?
* If not, do all the languages arrow supports have high quality libraries
for validating schemas?
* If so, then we have to implement/maintain/release/bugfix that.
This is true. However, Flatbuffers doesn't validate much on its own,
either, because its IDL is not expressive enough. For example,
`Schema.fbs` allows you to declare a INT8 field with children, a LIST
field without any children, a non-nullable NULL field...
(also, there's JSON Schema: https://json-schema.org/)
Regards
Antoine.