On Wed, Aug 11, 2021 at 4:22 PM Antoine Pitrou <anto...@python.org> wrote:
> > Le 11/08/2021 à 22:16, Phillip Cloud a écrit : > > > > Yeah, that is a drawback here, though I don't see needing to run flatc > as a > > major downside given the upside > > of not having to write additional code to move between formats. > > That's only an advantage if you already know how to read the Arrow IPC > format (and, yes, in this case you already run `flatc`). Some projects > probably don't care about Arrow IPC (Dask, for example). I don't think it's about the IPC though, at least for the compute IR use case. Am I missing something there? I do think a downside of not using something like JSON or msgpack is that schema validation must be implemented by both the producer and the consumer. That means we'd have a number of other consequential decisions to make: * Do we provide the validation library? * If not, do all the languages arrow supports have high quality libraries for validating schemas? * If so, then we have to implement/maintain/release/bugfix that. This isn't the case with fb or protos since they have done the work to produce valid schemas by definition. > > > Is there something particularly onerous about needing to run a codegen > step > > in a build process > > (other than it being build-step number 1000 in a death by 1000 > build-steps > > scenario)? > > Most Python packages (except perhaps Numpy, Pandas, PyArrow...) have a > very simple build configuration. Adding an external command in the mix > (that needs a non-standard dependency) isn't trivial. > I don't find this too compelling. One language's lack of modern dependency management tooling and refusal to make it easy to run external tools during that process doesn't seem like a strong reason to rule out flatbuffers here. I want to support everyone as best we can, but any choice we make here will have some tradeoffs. I see not being able to share the exact same schema and type information as a huge downside relative to the cost of having to run a binary during a build process. To be clear, users should _definitely_ not be running flatc, it's only library authors that should be running it as part of a development/build/release cycle. > > Regards > > Antoine. >