+1 this looks good to me.

My only concern is with criteria #3 " Is the underlying encoding of the
type already semantically supported by a type?". I think this is a good
criteria, but it's inconsistent with the current spec. By that criteria
some existing types (Timestamp, Time, Duration, Date) should be well known
extension types, right?

Perhaps we should explicitly indicate these types are grandfathered in [1]
because they existed before extension types, to avoid tension with this
criteria.

Brian

[1] https://en.wikipedia.org/wiki/Grandfather_clause

On Thu, Apr 29, 2021 at 9:13 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Thanks for writing this.
>
> I agree. That is a good decision tree. +1
>
> Best,
> Jorge
>
>
> On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > The discussion around adding another interval type to the Schema.fbs
> raises
> > the issue of when do we decide to add a new type to the Schema.fbs vs
> using
> > other means (primarily extension types [1]).
> >
> > A few criteria come to mind that could help decide (feedback welcome):
> >
> > 1.  Is the type a new parameterization of an existing type?
> >     - If Yes, and we believe the parameterization is useful and can be
> done
> > in a forward/backward compatible manner then we would update Schema.fbs.
> >
> > 2.  Does the type itself have its own specification for processing (e.g.
> > JSON, BSON, Thrift, Avro, Protobuf)?
> >   - If yes, we would NOT add them to Schema.fbs.  I think this would
> > potentially yield too many new types.
> >
> > 3.  Is the underlying encoding of the type already semantically supported
> > by a type? (e.g. if we want to encode physical lengths like meters these
> > can be represented by an integer).
> >    - If yes, we would NOT update the specification.  This seems like the
> > exact use-case that extension types are meant for.
> >
> > * How does this apply to Interval? *
> > Interval extends an existing type in the specification and multiple
> "packed
> > fields" cannot be easily communicated with the current version of the
> > specification.  Hence, I feel comfortable making the addition to
> Schema.fbs
> >
> > * What does this mean for other common types? *
> >
> > I think as types come up that are very common but we don't want to add to
> > the Schema.fbs we should invest in formalizing them as "Well Known"
> > Extension types.  In this scenario, we would update the specification to
> > include how to specify the extension type metadata (and still require at
> > least two libraries support the Extension type before inclusion as "Well
> > Known").
> >
> > * Practical implications *
> >
> > I think this means the type system in Schema.fbs is mostly closed (i.e.
> > there is a high bar for adding new types). One potentially useful type to
> > have would be a "packed struct" that supports something similar to python
> > struct library [2].  I think this would likely cover many extension type
> > use-cases.
> >
> > Thoughts?
> >
> > -Micah
> >
> > [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types
> > [2] https://docs.python.org/3/library/struct.html
> >
>

Reply via email to