Regarding schema, my proposal for fingerprints would be that units are
fingerprinted based on their canonical form, as defined here
<http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>.
Any two unit expressions having the same canonical form (including the
corresponding coefficients) are exactly equivalent, and so their
fingerprints can be the same. Possibly the unit could be stored on the
schema in canonical form by convention, although canonical forms are
frequently not as intuitive to humans and so in that case the documentation
value of the unit might be reduced for humans examining the schema.

For schema evolution, a unit change such that the previous and new unit are
convertable (also defined as at the above link) would be well defined, and
automatic transformation would just be the correct unit conversion (e.g.
seconds to milliseconds). If the unit changes to a non-convertable unit
(e.g. seconds to bytes) then no automatic transformation exists, and
attempting to resolve the old and new schema would be an error. Note that
establishing the conversion assumes that both original and new schemas are
available at read time.


On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <[email protected]> wrote:

> I think we should approach this idea in two parts:
>
> 1) The schema. Things like does a different unit mean a different schema
> fingerprint even though the bytes remain the same. What does a different
> unit mean for schema evolution.
>
> 2) Language specifics. Scala has different possibilities than Java.
>
> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <[email protected]> wrote:
>
> > I've been puzzling over what can be done to support this in more
> > widely-used languages. The dilemma relative to the current language
> > ecosystem is that languages with "modern" type systems (Haskell, Rust,
> > Scala, etc) capable of supporting compile-time unit checking, in the
> > particular style I've been exploring, are not yet widely used.
> >
> > With respect to Java, a couple approaches are plausible. One is to
> enhance
> > the language, for example with Java-8 compiler plugins. Another might be
> to
> > implement a unit type system similar to squants
> > <https://github.com/typelevel/squants>. This style of unit type system
> is
> > not as flexible or intuitive as what can be done with Scala's latest type
> > system sorcery, but it would allow the community to build out a Java
> native
> > type system that supports compile-time unit analysis. And its coverage of
> > standard units could be made very good, as squants itself demonstrates.
> >
> > Python would also be a high-coverage target. I'm even less sure what to
> do
> > for python, as it has no compile-time type checking, but perhaps a
> > squants-like python class system would add value. Maybe python's new
> > type-hints feature could be leveraged?
> >
> > Regarding unit expression representation, I'm not unhappy with what I've
> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies that
> > would need addressing. It doesn't yet support standard unit
> abbreviations,
> > nor does it understand plurals (e.g. it can parse "second" but not
> > "seconds"). Since it's "unit" field is just a custom metadata key, there
> is
> > no enforcement. Parsers are currently instantiated via explicit lists of
> > types, which is a property I like, but that may not work well in a world
> > where multiple language bindings must be supported in a portable manner.
> >
> >
> >
> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I attended your talk in Berlin and at the end I thought "too bad this
> is
> > > only Scala".
> > >
> > > I think it's a good idea to have this in Avro.
> > >
> > > The details will be tricky: How to encode the units in the schema for
> > > example.
> > > Especially because of the automatic conversion you spoke about.
> > >
> > > Niels
> > >
> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <[email protected]>
> wrote:
> > >
> > > > Hi Avro community,
> > > >
> > > > Recently I have been experimenting with avro schema that are extended
> > > with
> > > > a "unit" field. By "unit" I mean expressions like "second", or
> > > "megabyte" -
> > > > that is "units of measure".
> > > >
> > > > I delivered a short talk on my experiments at Berlin Buzzwords, which
> > can
> > > > be viewed here:
> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
> > > > I also wrote a short blog post that may be faster to ingest:
> > > >
> > > >
> > >
> >
> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
> > > >
> > > > I received some audience interest in making this concept "first
> class"
> > > for
> > > > avro, and so I'm writing to see what the avro dev community thinks of
> > the
> > > > idea. One issue is that this kind of unit checking is currently only
> > > > available for Scala (and specifically scala 2.13 +).
> > > >
> > > > The Scala project itself is here:
> > > > https://github.com/erikerlandson/coulomb
> > > >
> > > > Cheers,
> > > > Erik
> > > >
> > >
> >
>

Reply via email to