Re: [DISCUSS] Splitting out the Arrow format directory

Phillip Cloud Wed, 11 Aug 2021 14:06:53 -0700

On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou <anto...@python.org> wrote:


>
> Le 11/08/2021 à 22:16, Phillip Cloud a écrit :
> >
> > Yeah, that is a drawback here, though I don't see needing to run flatc
> as a
> > major downside given the upside
> > of not having to write additional code to move between formats.
>
> That's only an advantage if you already know how to read the Arrow IPC
> format (and, yes, in this case you already run `flatc`).  Some projects
> probably don't care about Arrow IPC (Dask, for example).


I don't think it's about the IPC though, at least for the compute IR use
case.
Am I missing something there?

I do think a downside of not using something like JSON or msgpack is
that schema validation must be implemented by both the producer and the
consumer.
That means we'd have a number of other consequential decisions to make:

* Do we provide the validation library?
* If not, do all the languages arrow supports have high quality libraries
for validating schemas?
* If so, then we have to implement/maintain/release/bugfix that.

This isn't the case with fb or protos since they have done the work
to produce
valid schemas by definition.


>
> > Is there something particularly onerous about needing to run a codegen
> step
> > in a build process
> > (other than it being build-step number 1000 in a death by 1000
> build-steps
> > scenario)?
>
> Most Python packages (except perhaps Numpy, Pandas, PyArrow...) have a
> very simple build configuration.  Adding an external command in the mix
> (that needs a non-standard dependency) isn't trivial.
>

I don't find this too compelling. One language's lack of modern dependency
management tooling and refusal to make it easy to run external tools during
that process doesn't seem like a strong reason to rule out flatbuffers here.

I want to support everyone as best we can, but any choice we make here
will have some tradeoffs. I see not being able to share the exact same
schema and type information as a huge downside relative to the cost
of having to run a binary during a build process.

To be clear, users should _definitely_ not be running flatc, it's only
library
authors that should be running it as part of a development/build/release
cycle.


>
> Regards
>
> Antoine.
>

Re: [DISCUSS] Splitting out the Arrow format directory

Reply via email to