Hi Patrick, > I'm working with Steve on this issue. Can you please share what you have > in mind for something more general than Gandiva's serialized expressions?
Not necessarily something "more" general, but we should ensure that the approach taken should be capable of representing the same information as Gandiva, so we can ultimately try to ensure convergence between the two. I'm currently working through a design. I imagine we will have a FlatBuffer > schema defining all expression types and have the different cpp expression > classes (i.e. ComparisonExpression) act as wrappers around the generated > flatbuf structs. This sounds like the general approach that is taken for Schema.fbs, so probably a reasonable place to start. As Wes said, there will probably be a lot of input in this area, but having a concrete proposal would help guide the conversation. I also noticed that the data types used in filters are not backed by > format/Expression.fbs and instead use the types defined in cpp/arrow/type.h Do you mean Schema.fbs? Thanks, Micah On Thu, Jul 9, 2020 at 1:56 PM Patrick Pai <patrick.m....@gmail.com> wrote: > I'm working with Steve on this issue. Can you please share what you have > in mind for something more general than Gandiva's serialized expressions? > > I'm currently working through a design. I imagine we will have a > FlatBuffer schema defining all expression types and have the different cpp > expression classes (i.e. ComparisonExpression) act as wrappers around the > generated flatbuf structs. > > I also noticed that the data types used in filters are not backed by > format/Expression.fbs and instead use the types defined in cpp/arrow/type.h > I'm thinking it would be good to make the move to using Expression.fbs so > that the data types themselves are also language independent. I'd > appreciate any feedback or thoughts. > > On 2020/07/06 21:44:40, Wes McKinney <wesmck...@gmail.com> wrote: > > I would also be interested in having a reusable serialized format for > > filter- and projection-like expressions. I think trying to go so far > > as full logical query plans suitable for building a SQL engine is > > perhaps a bit too far but we could start small with the use case from > > the JNI Datasets PR as a motivating example. We should also consider > > replacing or deprecating Gandiva's serialized expressions in favor of > > something more general. > > > > It may be a slight bikeshed issue, but I wouldn't be thrilled about > > having this be based on Protocol Buffers, because of the runtime > > requirement (on libprotobuf.so / libprotobuf.a) it introduces into C++ > > applications. Flatbuffers might be less pleasant developer UX in Java > > but at least in C++ the fact that Flatbuffers results in zero build- > > or runtime dependencies is a significant advantage. >