Thanks for writing this. I agree. That is a good decision tree. +1
Best, Jorge On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > The discussion around adding another interval type to the Schema.fbs raises > the issue of when do we decide to add a new type to the Schema.fbs vs using > other means (primarily extension types [1]). > > A few criteria come to mind that could help decide (feedback welcome): > > 1. Is the type a new parameterization of an existing type? > - If Yes, and we believe the parameterization is useful and can be done > in a forward/backward compatible manner then we would update Schema.fbs. > > 2. Does the type itself have its own specification for processing (e.g. > JSON, BSON, Thrift, Avro, Protobuf)? > - If yes, we would NOT add them to Schema.fbs. I think this would > potentially yield too many new types. > > 3. Is the underlying encoding of the type already semantically supported > by a type? (e.g. if we want to encode physical lengths like meters these > can be represented by an integer). > - If yes, we would NOT update the specification. This seems like the > exact use-case that extension types are meant for. > > * How does this apply to Interval? * > Interval extends an existing type in the specification and multiple "packed > fields" cannot be easily communicated with the current version of the > specification. Hence, I feel comfortable making the addition to Schema.fbs > > * What does this mean for other common types? * > > I think as types come up that are very common but we don't want to add to > the Schema.fbs we should invest in formalizing them as "Well Known" > Extension types. In this scenario, we would update the specification to > include how to specify the extension type metadata (and still require at > least two libraries support the Extension type before inclusion as "Well > Known"). > > * Practical implications * > > I think this means the type system in Schema.fbs is mostly closed (i.e. > there is a high bar for adding new types). One potentially useful type to > have would be a "packed struct" that supports something similar to python > struct library [2]. I think this would likely cover many extension type > use-cases. > > Thoughts? > > -Micah > > [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types > [2] https://docs.python.org/3/library/struct.html >