What should I do to move this forward? Does Avro have a PIP process?
On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <[email protected]> wrote: > > Regarding schema, my proposal for fingerprints would be that units are > fingerprinted based on their canonical form, as defined here > <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>. > Any two unit expressions having the same canonical form (including the > corresponding coefficients) are exactly equivalent, and so their > fingerprints can be the same. Possibly the unit could be stored on the > schema in canonical form by convention, although canonical forms are > frequently not as intuitive to humans and so in that case the documentation > value of the unit might be reduced for humans examining the schema. > > For schema evolution, a unit change such that the previous and new unit > are convertable (also defined as at the above link) would be well defined, > and automatic transformation would just be the correct unit conversion > (e.g. seconds to milliseconds). If the unit changes to a non-convertable > unit (e.g. seconds to bytes) then no automatic transformation exists, and > attempting to resolve the old and new schema would be an error. Note that > establishing the conversion assumes that both original and new schemas are > available at read time. > > > On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <[email protected]> wrote: > >> I think we should approach this idea in two parts: >> >> 1) The schema. Things like does a different unit mean a different schema >> fingerprint even though the bytes remain the same. What does a different >> unit mean for schema evolution. >> >> 2) Language specifics. Scala has different possibilities than Java. >> >> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <[email protected]> wrote: >> >> > I've been puzzling over what can be done to support this in more >> > widely-used languages. The dilemma relative to the current language >> > ecosystem is that languages with "modern" type systems (Haskell, Rust, >> > Scala, etc) capable of supporting compile-time unit checking, in the >> > particular style I've been exploring, are not yet widely used. >> > >> > With respect to Java, a couple approaches are plausible. One is to >> enhance >> > the language, for example with Java-8 compiler plugins. Another might >> be to >> > implement a unit type system similar to squants >> > <https://github.com/typelevel/squants>. This style of unit type system >> is >> > not as flexible or intuitive as what can be done with Scala's latest >> type >> > system sorcery, but it would allow the community to build out a Java >> native >> > type system that supports compile-time unit analysis. And its coverage >> of >> > standard units could be made very good, as squants itself demonstrates. >> > >> > Python would also be a high-coverage target. I'm even less sure what to >> do >> > for python, as it has no compile-time type checking, but perhaps a >> > squants-like python class system would add value. Maybe python's new >> > type-hints feature could be leveraged? >> > >> > Regarding unit expression representation, I'm not unhappy with what I've >> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies that >> > would need addressing. It doesn't yet support standard unit >> abbreviations, >> > nor does it understand plurals (e.g. it can parse "second" but not >> > "seconds"). Since it's "unit" field is just a custom metadata key, >> there is >> > no enforcement. Parsers are currently instantiated via explicit lists of >> > types, which is a property I like, but that may not work well in a world >> > where multiple language bindings must be supported in a portable manner. >> > >> > >> > >> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <[email protected]> wrote: >> > >> > > Hi, >> > > >> > > I attended your talk in Berlin and at the end I thought "too bad this >> is >> > > only Scala". >> > > >> > > I think it's a good idea to have this in Avro. >> > > >> > > The details will be tricky: How to encode the units in the schema for >> > > example. >> > > Especially because of the automatic conversion you spoke about. >> > > >> > > Niels >> > > >> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <[email protected]> >> wrote: >> > > >> > > > Hi Avro community, >> > > > >> > > > Recently I have been experimenting with avro schema that are >> extended >> > > with >> > > > a "unit" field. By "unit" I mean expressions like "second", or >> > > "megabyte" - >> > > > that is "units of measure". >> > > > >> > > > I delivered a short talk on my experiments at Berlin Buzzwords, >> which >> > can >> > > > be viewed here: >> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8 >> > > > I also wrote a short blog post that may be faster to ingest: >> > > > >> > > > >> > > >> > >> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/ >> > > > >> > > > I received some audience interest in making this concept "first >> class" >> > > for >> > > > avro, and so I'm writing to see what the avro dev community thinks >> of >> > the >> > > > idea. One issue is that this kind of unit checking is currently only >> > > > available for Scala (and specifically scala 2.13 +). >> > > > >> > > > The Scala project itself is here: >> > > > https://github.com/erikerlandson/coulomb >> > > > >> > > > Cheers, >> > > > Erik >> > > > >> > > >> > >> >
