I agree with others on this thread. Thanks for writing this down Micah

On Fri, Apr 30, 2021 at 11:16 AM Antoine Pitrou <anto...@python.org> wrote:

>
> I concur with both what Wes and Micah said.
>
> As for temporal types, they have wide-spread use and their semantics
> require dedicated treatment for arithmetic and conversion, so it's
> helpful to define dedicated types for them, as opposed to mere annotations.
>
> Regards
>
> Antoine.
>
>
> Le 30/04/2021 à 16:40, Wes McKinney a écrit :
> > I agree that the bar for adding new types to the Type union in Schema.fbs
> > should be quite high going forward. Using extension types increasingly
> for
> > adding specializations of built-in types will mean less burden for
> > implementations to simply "propagate forward" this data (by preserving
> the
> > extra metadata) even if they don't understand what it does. It would be
> > nice, therefore, to put us on a path to expanding our set of "official"
> > extension types (which would include things like JSON or UUID) since some
> > libraries may choose to implement convenience containers for these for
> > usability.
> >
> > On Fri, Apr 30, 2021 at 9:22 AM Brian Hulette <bhule...@apache.org>
> wrote:
> >
> >> +1 this looks good to me.
> >>
> >> My only concern is with criteria #3 " Is the underlying encoding of the
> >> type already semantically supported by a type?". I think this is a good
> >> criteria, but it's inconsistent with the current spec. By that criteria
> >> some existing types (Timestamp, Time, Duration, Date) should be well
> known
> >> extension types, right?
> >>
> >> Perhaps we should explicitly indicate these types are grandfathered in
> [1]
> >> because they existed before extension types, to avoid tension with this
> >> criteria.
> >>
> >> Brian
> >>
> >> [1] https://en.wikipedia.org/wiki/Grandfather_clause
> >>
> >> On Thu, Apr 29, 2021 at 9:13 PM Jorge Cardoso Leitão <
> >> jorgecarlei...@gmail.com> wrote:
> >>
> >>> Thanks for writing this.
> >>>
> >>> I agree. That is a good decision tree. +1
> >>>
> >>> Best,
> >>> Jorge
> >>>
> >>>
> >>> On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com
> >
> >>> wrote:
> >>>
> >>>> The discussion around adding another interval type to the Schema.fbs
> >>> raises
> >>>> the issue of when do we decide to add a new type to the Schema.fbs vs
> >>> using
> >>>> other means (primarily extension types [1]).
> >>>>
> >>>> A few criteria come to mind that could help decide (feedback welcome):
> >>>>
> >>>> 1.  Is the type a new parameterization of an existing type?
> >>>>      - If Yes, and we believe the parameterization is useful and can
> be
> >>> done
> >>>> in a forward/backward compatible manner then we would update
> >> Schema.fbs.
> >>>>
> >>>> 2.  Does the type itself have its own specification for processing
> >> (e.g.
> >>>> JSON, BSON, Thrift, Avro, Protobuf)?
> >>>>    - If yes, we would NOT add them to Schema.fbs.  I think this would
> >>>> potentially yield too many new types.
> >>>>
> >>>> 3.  Is the underlying encoding of the type already semantically
> >> supported
> >>>> by a type? (e.g. if we want to encode physical lengths like meters
> >> these
> >>>> can be represented by an integer).
> >>>>     - If yes, we would NOT update the specification.  This seems like
> >> the
> >>>> exact use-case that extension types are meant for.
> >>>>
> >>>> * How does this apply to Interval? *
> >>>> Interval extends an existing type in the specification and multiple
> >>> "packed
> >>>> fields" cannot be easily communicated with the current version of the
> >>>> specification.  Hence, I feel comfortable making the addition to
> >>> Schema.fbs
> >>>>
> >>>> * What does this mean for other common types? *
> >>>>
> >>>> I think as types come up that are very common but we don't want to add
> >> to
> >>>> the Schema.fbs we should invest in formalizing them as "Well Known"
> >>>> Extension types.  In this scenario, we would update the specification
> >> to
> >>>> include how to specify the extension type metadata (and still require
> >> at
> >>>> least two libraries support the Extension type before inclusion as
> >> "Well
> >>>> Known").
> >>>>
> >>>> * Practical implications *
> >>>>
> >>>> I think this means the type system in Schema.fbs is mostly closed
> (i.e.
> >>>> there is a high bar for adding new types). One potentially useful type
> >> to
> >>>> have would be a "packed struct" that supports something similar to
> >> python
> >>>> struct library [2].  I think this would likely cover many extension
> >> type
> >>>> use-cases.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> -Micah
> >>>>
> >>>> [1]
> https://arrow.apache.org/docs/format/Columnar.html#extension-types
> >>>> [2] https://docs.python.org/3/library/struct.html
> >>>>
> >>>
> >>
> >
>

Reply via email to