Hey all,

For some time I've been thinking that having a common serialized
representation of query plans would be helpful across multiple related
projects. I started working on something independently in this space
several months ago. Since then, Arrow started exploring "Arrow IR" and
Iceberg was proposing something similar to support a cross-engine
structured view. Given the different veins of interest, I think we should
combine forces on a consolidated consensus-driven solution.

As I've had more conversations with different people, I've come to the
conclusion that given the complexity of the task and people's
competing priorities, a separate "Switzerland project" is the best way to
find common ground. As such, I've started to sketch out a specification [1]
called Substrait. One of my key goals with this effort is to expose Calcite
functionality to more users and expose alternative ways to encapsulate
Calcite functionality as a microservice or series of microservices.

For those that are interested, please join the Substrait Slack. My first
goal is to come to a consensus on the type system of simple [2], compound
[3] and physical [4] types. The general approach I'm proposing:

   - Use Spark, Trino, Arrow and Iceberg as the four indicators of whether
   something should be part of the spec. It must exist in at least two systems
   to be formalized.
   - Avoid a formal distinction between logical and physical (types,
   operators, etc)
   - Lean more towards simple types than compound types when systems
   generally use only a constrained set of parameters (e.g. timestamp(3) and
   timestamp(6) as opposed to timestamp(x)).
   - Provide substantial structured extensibility (avoid black boxes as
   much as possible)


Links for Substrait:
Site: https://substrait.io
Spec source: https://github.com/substrait-io/substrait/tree/main/site/docs
Binary format: https://github.com/substrait-io/substrait/tree/main/binary

Would love to hear your thoughts!
Jacques

[1] https://substrait.io/spec/specification/#components
[2] https://substrait.io/types/simple_logical_types/
[3] https://substrait.io/types/compound_logical_types/
[4] https://substrait.io/types/physical_types/

Reply via email to