Re: [DISCUSS] Splitting out the Arrow format directory

Phillip Cloud Wed, 11 Aug 2021 14:11:11 -0700

On Wed, Aug 11, 2021 at 4:21 PM David Li <lidav...@apache.org> wrote:


> If the worry is public distribution (i.e. requiring all downstream
> projects to also run flatc in their builds) we could perhaps ship a package
> that just consists of the generated code (though that's definitely more
> packaging burden, and won't help when you're doing development against
> in-progress or unreleased changes).
>
> -David
>

Arrow need not take on yet another packaging burden here: library authors
can run flatc during development and release cycles, and ship that code
alongside (whatever that means for the specific language) their library
code. End users of, say, ibis never need to think about having flatc around.


>
> On Wed, Aug 11, 2021, at 16:16, Phillip Cloud wrote:
> > On Wed, Aug 11, 2021 at 4:05 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Le 11/08/2021 à 22:02, Phillip Cloud a écrit :
> > > > On Wed, Aug 11, 2021 at 3:58 PM Antoine Pitrou <anto...@python.org>
> > > wrote:
> > > >
> > > >>
> > > >> Le 11/08/2021 à 21:56, Phillip Cloud a écrit :
> > > >>> I can see how that might be a bit circular. Let me start from the
> > > >>> perspective of requirements. We want to be able to reuse the
> arrow's
> > > >> types
> > > >>> and schema, without having to write additional code to move back
> and
> > > >> forth
> > > >>> between compute IR and not-compute-IR. I think that leaves only
> > > >> flatbuffers
> > > >>> as an option.
> > > >>
> > > >> If that's the case then agreed (well, you can always embed as a raw
> > > >> bytestring in other formats, but that wouldn't be pretty).
> > > >>
> > > >> I just wonder what the complexity of using Flatbuffers is for e.g.
> > > Python.
> > > >>
> > > >
> > > > IMO the complexity isn't high, but the generated code is definitely
> not
> > > > idiomatic (
> > > > https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html
> )
> > >
> > > Wow. And you also have to integrate `flatc` in your build chain?
> > >
> >
> > Yeah, that is a drawback here, though I don't see needing to run flatc
> as a
> > major downside given the upside
> > of not having to write additional code to move between formats.
> >
> > Is there something particularly onerous about needing to run a codegen
> step
> > in a build process
> > (other than it being build-step number 1000 in a death by 1000
> build-steps
> > scenario)?
> >
> >
> > >
> > > IMHO that compares poorly to JSON or MsgPack, for example.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

Re: [DISCUSS] Splitting out the Arrow format directory

Reply via email to