What should I do to move this forward? Does Avro have a PIP process?

On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <[email protected]> wrote:

>
> Regarding schema, my proposal for fingerprints would be that units are
> fingerprinted based on their canonical form, as defined here
> <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>.
> Any two unit expressions having the same canonical form (including the
> corresponding coefficients) are exactly equivalent, and so their
> fingerprints can be the same. Possibly the unit could be stored on the
> schema in canonical form by convention, although canonical forms are
> frequently not as intuitive to humans and so in that case the documentation
> value of the unit might be reduced for humans examining the schema.
>
> For schema evolution, a unit change such that the previous and new unit
> are convertable (also defined as at the above link) would be well defined,
> and automatic transformation would just be the correct unit conversion
> (e.g. seconds to milliseconds). If the unit changes to a non-convertable
> unit (e.g. seconds to bytes) then no automatic transformation exists, and
> attempting to resolve the old and new schema would be an error. Note that
> establishing the conversion assumes that both original and new schemas are
> available at read time.
>
>
> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <[email protected]> wrote:
>
>> I think we should approach this idea in two parts:
>>
>> 1) The schema. Things like does a different unit mean a different schema
>> fingerprint even though the bytes remain the same. What does a different
>> unit mean for schema evolution.
>>
>> 2) Language specifics. Scala has different possibilities than Java.
>>
>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <[email protected]> wrote:
>>
>> > I've been puzzling over what can be done to support this in more
>> > widely-used languages. The dilemma relative to the current language
>> > ecosystem is that languages with "modern" type systems (Haskell, Rust,
>> > Scala, etc) capable of supporting compile-time unit checking, in the
>> > particular style I've been exploring, are not yet widely used.
>> >
>> > With respect to Java, a couple approaches are plausible. One is to
>> enhance
>> > the language, for example with Java-8 compiler plugins. Another might
>> be to
>> > implement a unit type system similar to squants
>> > <https://github.com/typelevel/squants>. This style of unit type system
>> is
>> > not as flexible or intuitive as what can be done with Scala's latest
>> type
>> > system sorcery, but it would allow the community to build out a Java
>> native
>> > type system that supports compile-time unit analysis. And its coverage
>> of
>> > standard units could be made very good, as squants itself demonstrates.
>> >
>> > Python would also be a high-coverage target. I'm even less sure what to
>> do
>> > for python, as it has no compile-time type checking, but perhaps a
>> > squants-like python class system would add value. Maybe python's new
>> > type-hints feature could be leveraged?
>> >
>> > Regarding unit expression representation, I'm not unhappy with what I've
>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies that
>> > would need addressing. It doesn't yet support standard unit
>> abbreviations,
>> > nor does it understand plurals (e.g. it can parse "second" but not
>> > "seconds"). Since it's "unit" field is just a custom metadata key,
>> there is
>> > no enforcement. Parsers are currently instantiated via explicit lists of
>> > types, which is a property I like, but that may not work well in a world
>> > where multiple language bindings must be supported in a portable manner.
>> >
>> >
>> >
>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I attended your talk in Berlin and at the end I thought "too bad this
>> is
>> > > only Scala".
>> > >
>> > > I think it's a good idea to have this in Avro.
>> > >
>> > > The details will be tricky: How to encode the units in the schema for
>> > > example.
>> > > Especially because of the automatic conversion you spoke about.
>> > >
>> > > Niels
>> > >
>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <[email protected]>
>> wrote:
>> > >
>> > > > Hi Avro community,
>> > > >
>> > > > Recently I have been experimenting with avro schema that are
>> extended
>> > > with
>> > > > a "unit" field. By "unit" I mean expressions like "second", or
>> > > "megabyte" -
>> > > > that is "units of measure".
>> > > >
>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
>> which
>> > can
>> > > > be viewed here:
>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>> > > > I also wrote a short blog post that may be faster to ingest:
>> > > >
>> > > >
>> > >
>> >
>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>> > > >
>> > > > I received some audience interest in making this concept "first
>> class"
>> > > for
>> > > > avro, and so I'm writing to see what the avro dev community thinks
>> of
>> > the
>> > > > idea. One issue is that this kind of unit checking is currently only
>> > > > available for Scala (and specifically scala 2.13 +).
>> > > >
>> > > > The Scala project itself is here:
>> > > > https://github.com/erikerlandson/coulomb
>> > > >
>> > > > Cheers,
>> > > > Erik
>> > > >
>> > >
>> >
>>
>

Reply via email to