I agree with others on this thread. Thanks for writing this down Micah On Fri, Apr 30, 2021 at 11:16 AM Antoine Pitrou <anto...@python.org> wrote:
> > I concur with both what Wes and Micah said. > > As for temporal types, they have wide-spread use and their semantics > require dedicated treatment for arithmetic and conversion, so it's > helpful to define dedicated types for them, as opposed to mere annotations. > > Regards > > Antoine. > > > Le 30/04/2021 à 16:40, Wes McKinney a écrit : > > I agree that the bar for adding new types to the Type union in Schema.fbs > > should be quite high going forward. Using extension types increasingly > for > > adding specializations of built-in types will mean less burden for > > implementations to simply "propagate forward" this data (by preserving > the > > extra metadata) even if they don't understand what it does. It would be > > nice, therefore, to put us on a path to expanding our set of "official" > > extension types (which would include things like JSON or UUID) since some > > libraries may choose to implement convenience containers for these for > > usability. > > > > On Fri, Apr 30, 2021 at 9:22 AM Brian Hulette <bhule...@apache.org> > wrote: > > > >> +1 this looks good to me. > >> > >> My only concern is with criteria #3 " Is the underlying encoding of the > >> type already semantically supported by a type?". I think this is a good > >> criteria, but it's inconsistent with the current spec. By that criteria > >> some existing types (Timestamp, Time, Duration, Date) should be well > known > >> extension types, right? > >> > >> Perhaps we should explicitly indicate these types are grandfathered in > [1] > >> because they existed before extension types, to avoid tension with this > >> criteria. > >> > >> Brian > >> > >> [1] https://en.wikipedia.org/wiki/Grandfather_clause > >> > >> On Thu, Apr 29, 2021 at 9:13 PM Jorge Cardoso Leitão < > >> jorgecarlei...@gmail.com> wrote: > >> > >>> Thanks for writing this. > >>> > >>> I agree. That is a good decision tree. +1 > >>> > >>> Best, > >>> Jorge > >>> > >>> > >>> On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield <emkornfi...@gmail.com > > > >>> wrote: > >>> > >>>> The discussion around adding another interval type to the Schema.fbs > >>> raises > >>>> the issue of when do we decide to add a new type to the Schema.fbs vs > >>> using > >>>> other means (primarily extension types [1]). > >>>> > >>>> A few criteria come to mind that could help decide (feedback welcome): > >>>> > >>>> 1. Is the type a new parameterization of an existing type? > >>>> - If Yes, and we believe the parameterization is useful and can > be > >>> done > >>>> in a forward/backward compatible manner then we would update > >> Schema.fbs. > >>>> > >>>> 2. Does the type itself have its own specification for processing > >> (e.g. > >>>> JSON, BSON, Thrift, Avro, Protobuf)? > >>>> - If yes, we would NOT add them to Schema.fbs. I think this would > >>>> potentially yield too many new types. > >>>> > >>>> 3. Is the underlying encoding of the type already semantically > >> supported > >>>> by a type? (e.g. if we want to encode physical lengths like meters > >> these > >>>> can be represented by an integer). > >>>> - If yes, we would NOT update the specification. This seems like > >> the > >>>> exact use-case that extension types are meant for. > >>>> > >>>> * How does this apply to Interval? * > >>>> Interval extends an existing type in the specification and multiple > >>> "packed > >>>> fields" cannot be easily communicated with the current version of the > >>>> specification. Hence, I feel comfortable making the addition to > >>> Schema.fbs > >>>> > >>>> * What does this mean for other common types? * > >>>> > >>>> I think as types come up that are very common but we don't want to add > >> to > >>>> the Schema.fbs we should invest in formalizing them as "Well Known" > >>>> Extension types. In this scenario, we would update the specification > >> to > >>>> include how to specify the extension type metadata (and still require > >> at > >>>> least two libraries support the Extension type before inclusion as > >> "Well > >>>> Known"). > >>>> > >>>> * Practical implications * > >>>> > >>>> I think this means the type system in Schema.fbs is mostly closed > (i.e. > >>>> there is a high bar for adding new types). One potentially useful type > >> to > >>>> have would be a "packed struct" that supports something similar to > >> python > >>>> struct library [2]. I think this would likely cover many extension > >> type > >>>> use-cases. > >>>> > >>>> Thoughts? > >>>> > >>>> -Micah > >>>> > >>>> [1] > https://arrow.apache.org/docs/format/Columnar.html#extension-types > >>>> [2] https://docs.python.org/3/library/struct.html > >>>> > >>> > >> > > >