If I'm interpreting the situation correctly, there is an "Avro Enhancement
Proposal", but none have been filed in nearly a decade:
https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals

As a start, I submitted a jira to track this idea:
https://issues.apache.org/jira/browse/AVRO-2474



On Mon, Jul 8, 2019 at 10:42 AM Erik Erlandson <[email protected]> wrote:

>
> What should I do to move this forward? Does Avro have a PIP process?
>
>
> On Sat, Jun 29, 2019 at 3:26 PM Erik Erlandson <[email protected]>
> wrote:
>
>>
>> Regarding schema, my proposal for fingerprints would be that units are
>> fingerprinted based on their canonical form, as defined here
>> <http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/>.
>> Any two unit expressions having the same canonical form (including the
>> corresponding coefficients) are exactly equivalent, and so their
>> fingerprints can be the same. Possibly the unit could be stored on the
>> schema in canonical form by convention, although canonical forms are
>> frequently not as intuitive to humans and so in that case the documentation
>> value of the unit might be reduced for humans examining the schema.
>>
>> For schema evolution, a unit change such that the previous and new unit
>> are convertable (also defined as at the above link) would be well defined,
>> and automatic transformation would just be the correct unit conversion
>> (e.g. seconds to milliseconds). If the unit changes to a non-convertable
>> unit (e.g. seconds to bytes) then no automatic transformation exists, and
>> attempting to resolve the old and new schema would be an error. Note that
>> establishing the conversion assumes that both original and new schemas are
>> available at read time.
>>
>>
>> On Sat, Jun 29, 2019 at 11:55 AM Niels Basjes <[email protected]> wrote:
>>
>>> I think we should approach this idea in two parts:
>>>
>>> 1) The schema. Things like does a different unit mean a different schema
>>> fingerprint even though the bytes remain the same. What does a different
>>> unit mean for schema evolution.
>>>
>>> 2) Language specifics. Scala has different possibilities than Java.
>>>
>>> On Sat, Jun 29, 2019, 18:59 Erik Erlandson <[email protected]> wrote:
>>>
>>> > I've been puzzling over what can be done to support this in more
>>> > widely-used languages. The dilemma relative to the current language
>>> > ecosystem is that languages with "modern" type systems (Haskell, Rust,
>>> > Scala, etc) capable of supporting compile-time unit checking, in the
>>> > particular style I've been exploring, are not yet widely used.
>>> >
>>> > With respect to Java, a couple approaches are plausible. One is to
>>> enhance
>>> > the language, for example with Java-8 compiler plugins. Another might
>>> be to
>>> > implement a unit type system similar to squants
>>> > <https://github.com/typelevel/squants>. This style of unit type
>>> system is
>>> > not as flexible or intuitive as what can be done with Scala's latest
>>> type
>>> > system sorcery, but it would allow the community to build out a Java
>>> native
>>> > type system that supports compile-time unit analysis. And its coverage
>>> of
>>> > standard units could be made very good, as squants itself demonstrates.
>>> >
>>> > Python would also be a high-coverage target. I'm even less sure what
>>> to do
>>> > for python, as it has no compile-time type checking, but perhaps a
>>> > squants-like python class system would add value. Maybe python's new
>>> > type-hints feature could be leveraged?
>>> >
>>> > Regarding unit expression representation, I'm not unhappy with what
>>> I've
>>> > prototyped in `coulomb-avro`, in broad strokes. It has deficiencies
>>> that
>>> > would need addressing. It doesn't yet support standard unit
>>> abbreviations,
>>> > nor does it understand plurals (e.g. it can parse "second" but not
>>> > "seconds"). Since it's "unit" field is just a custom metadata key,
>>> there is
>>> > no enforcement. Parsers are currently instantiated via explicit lists
>>> of
>>> > types, which is a property I like, but that may not work well in a
>>> world
>>> > where multiple language bindings must be supported in a portable
>>> manner.
>>> >
>>> >
>>> >
>>> > On Sat, Jun 29, 2019 at 1:46 AM Niels Basjes <[email protected]> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I attended your talk in Berlin and at the end I thought "too bad
>>> this is
>>> > > only Scala".
>>> > >
>>> > > I think it's a good idea to have this in Avro.
>>> > >
>>> > > The details will be tricky: How to encode the units in the schema for
>>> > > example.
>>> > > Especially because of the automatic conversion you spoke about.
>>> > >
>>> > > Niels
>>> > >
>>> > > On Fri, Jun 28, 2019, 23:58 Erik Erlandson <[email protected]>
>>> wrote:
>>> > >
>>> > > > Hi Avro community,
>>> > > >
>>> > > > Recently I have been experimenting with avro schema that are
>>> extended
>>> > > with
>>> > > > a "unit" field. By "unit" I mean expressions like "second", or
>>> > > "megabyte" -
>>> > > > that is "units of measure".
>>> > > >
>>> > > > I delivered a short talk on my experiments at Berlin Buzzwords,
>>> which
>>> > can
>>> > > > be viewed here:
>>> > > > https://www.youtube.com/watch?v=qrQmB2KFKE8
>>> > > > I also wrote a short blog post that may be faster to ingest:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb/
>>> > > >
>>> > > > I received some audience interest in making this concept "first
>>> class"
>>> > > for
>>> > > > avro, and so I'm writing to see what the avro dev community thinks
>>> of
>>> > the
>>> > > > idea. One issue is that this kind of unit checking is currently
>>> only
>>> > > > available for Scala (and specifically scala 2.13 +).
>>> > > >
>>> > > > The Scala project itself is here:
>>> > > > https://github.com/erikerlandson/coulomb
>>> > > >
>>> > > > Cheers,
>>> > > > Erik
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to