If I'm interpreting the situation correctly, there is an "Avro Enhancement Proposal", but none have been filed in nearly a decade: https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals
As a start, I submitted a jira to track this idea: https://issues.apache.org/jira/browse/AVRO-2474 On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <[email protected]> wrote: > > What should I do to move this forward? Does Avro have a PIP process? > > > On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <[email protected]> > wrote: > >> >> Regarding schema, my proposal for fingerprints would be that units are >> fingerprinted based on their canonical form, as defined here >> <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>. >> Any two unit expressions having the same canonical form (including the >> corresponding coefficients) are exactly equivalent, and so their >> fingerprints can be the same. Possibly the unit could be stored on the >> schema in canonical form by convention, although canonical forms are >> frequently not as intuitive to humans and so in that case the documentation >> value of the unit might be reduced for humans examining the schema. >> >> For schema evolution, a unit change such that the previous and new unit >> are convertable (also defined as at the above link) would be well defined, >> and automatic transformation would just be the correct unit conversion >> (e.g. seconds to milliseconds). If the unit changes to a non-convertable >> unit (e.g. seconds to bytes) then no automatic transformation exists, and >> attempting to resolve the old and new schema would be an error. Note that >> establishing the conversion assumes that both original and new schemas are >> available at read time. >> >> >> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <[email protected]> wrote: >> >>> I think we should approach this idea in two parts: >>> >>> 1) The schema. Things like does a different unit mean a different schema >>> fingerprint even though the bytes remain the same. What does a different >>> unit mean for schema evolution. >>> >>> 2) Language specifics. Scala has different possibilities than Java. >>> >>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <[email protected]> wrote: >>> >>> > I've been puzzling over what can be done to support this in more >>> > widely-used languages. The dilemma relative to the current language >>> > ecosystem is that languages with "modern" type systems (Haskell, Rust, >>> > Scala, etc) capable of supporting compile-time unit checking, in the >>> > particular style I've been exploring, are not yet widely used. >>> > >>> > With respect to Java, a couple approaches are plausible. One is to >>> enhance >>> > the language, for example with Java-8 compiler plugins. Another might >>> be to >>> > implement a unit type system similar to squants >>> > <https://github.com/typelevel/squants>. This style of unit type >>> system is >>> > not as flexible or intuitive as what can be done with Scala's latest >>> type >>> > system sorcery, but it would allow the community to build out a Java >>> native >>> > type system that supports compile-time unit analysis. And its coverage >>> of >>> > standard units could be made very good, as squants itself demonstrates. >>> > >>> > Python would also be a high-coverage target. I'm even less sure what >>> to do >>> > for python, as it has no compile-time type checking, but perhaps a >>> > squants-like python class system would add value. Maybe python's new >>> > type-hints feature could be leveraged? >>> > >>> > Regarding unit expression representation, I'm not unhappy with what >>> I've >>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies >>> that >>> > would need addressing. It doesn't yet support standard unit >>> abbreviations, >>> > nor does it understand plurals (e.g. it can parse "second" but not >>> > "seconds"). Since it's "unit" field is just a custom metadata key, >>> there is >>> > no enforcement. Parsers are currently instantiated via explicit lists >>> of >>> > types, which is a property I like, but that may not work well in a >>> world >>> > where multiple language bindings must be supported in a portable >>> manner. >>> > >>> > >>> > >>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <[email protected]> wrote: >>> > >>> > > Hi, >>> > > >>> > > I attended your talk in Berlin and at the end I thought "too bad >>> this is >>> > > only Scala". >>> > > >>> > > I think it's a good idea to have this in Avro. >>> > > >>> > > The details will be tricky: How to encode the units in the schema for >>> > > example. >>> > > Especially because of the automatic conversion you spoke about. >>> > > >>> > > Niels >>> > > >>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <[email protected]> >>> wrote: >>> > > >>> > > > Hi Avro community, >>> > > > >>> > > > Recently I have been experimenting with avro schema that are >>> extended >>> > > with >>> > > > a "unit" field. By "unit" I mean expressions like "second", or >>> > > "megabyte" - >>> > > > that is "units of measure". >>> > > > >>> > > > I delivered a short talk on my experiments at Berlin Buzzwords, >>> which >>> > can >>> > > > be viewed here: >>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8 >>> > > > I also wrote a short blog post that may be faster to ingest: >>> > > > >>> > > > >>> > > >>> > >>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/ >>> > > > >>> > > > I received some audience interest in making this concept "first >>> class" >>> > > for >>> > > > avro, and so I'm writing to see what the avro dev community thinks >>> of >>> > the >>> > > > idea. One issue is that this kind of unit checking is currently >>> only >>> > > > available for Scala (and specifically scala 2.13 +). >>> > > > >>> > > > The Scala project itself is here: >>> > > > https://github.com/erikerlandson/coulomb >>> > > > >>> > > > Cheers, >>> > > > Erik >>> > > > >>> > > >>> > >>> >>
