I am all for expanding the core types… the current logical type shortcuts in the IDL lang will make this a bit more interesting to implement (I think they confuse more than they help)…
Regarding ID based field tracking, I am not sure I understand what problem does it solve, and there might be better solutions for it. but these discussions should be made as part of the AEP process… —Z > On Apr 28, 2020, at 8:50 PM, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > +1 for removing code that isn't maintained. We can still bring it back if > anyone is interested, but I like the idea of retiring it so that users get > a clear idea of its state (unmaintained) and so it doesn't slow down > development (releases blocked by code rot). I support separate versioning > and updating to semantic versioning, too! > > For the 2.0 format, I think there may be some other reasons to consider it > as well. > > First, it would be great to expand the core set of types to include > timestamps, dates, decimals, and maps with non-string keys. These are > available through logical types, but logical types are difficult to > configure and require deserialization and conversion instead of just > deserialization. We could gain performance and make Avro much easier to use > by adding to the core set of types. > > Second, I would like to see Avro adopt or support id-based field tracking > in schemas. We've built this in Apache Iceberg so that schema evolution in > Iceberg tables never have unintended side-effects. For example, dropping a > column and adding one with the same name never mixes the dropped column's > data with the new column's data; and it's still possible to un-delete > columns. Another benefit of id-based schemas is that producers and > consumers don't need to coordinate schema changes or keep old aliases. The > name of a column is whatever the id is labelled with in the reader's schema. > > I'm not sure that even these are enough to break compatibility with v1, but > I think it's worth a discussion. > > On Tue, Apr 28, 2020 at 1:01 AM Ismaël Mejía <ieme...@gmail.com> wrote: > >> Huge +1 to recover the Avro Enhancement Proposals (AEP) >> >> The experimental features Ryan mentioned definitely merit(ed) to be >> part of it, and in particular the procedure to decide when they will >> become ‘stable’ or default, for example for fastread. Also other >> proposals/discussions like the split release or semantic versioning >> should be part of it. >> >> About Avro 2.0.0 I think breaking binary compatibility of the format >> is going to prove to be a hard sell (are named unions valuable enough >> to break backwards compatibility?), if we can extend the binary format >> in a compatible way there is no reason to have 2.0.0 so I agree that >> there is a delicate balance we should avoid because strict stability >> could let us also ostracized. >> >> What I personally would like is to make Avro as lean and efficient as >> possible and focus mostly in the binary format part and tools probably >> removing the less used parts (IPC/RPC/trevni) so it is good to see >> that other people are starting to agree on that. >> >> One more radical idea I would like is to try is to unify a bit the >> implementations probably having a robust low level one in one systems >> language (C or Rust) and bindings for all the languages that rely on >> it but this is probably more because of my frustration of seeing >> projects that take this approach becoming slowly the standard and >> Apacho Avro relegated (this is already happening on the python front). >> >> In general the critical issue with Avro are the downstream >> consequences of our actions, and of course we will always have >> incomplete information, but we can investigate and see if changes are >> worth. >> >> Regards, >> Ismaël >> >> On Mon, Apr 27, 2020 at 6:51 PM Ryan Skraba <r...@skraba.com> wrote: >>> >>> Hello! >>> >>> You bring up some good points -- I'm glad Avro is so widely used, but >>> it does make me nervous to see any changes that might break other >>> projects, or change any behaviour. >>> >>> Currently, we've talked about managing developer expectations with >>> semantic versioning (especially with the necessary Jackson API cleanup >>> that happened in 1.9.x), or versioning artifacts separately. >>> >>> We also have a couple of experimental/feature flags for some behaviour >>> changes: >> https://cwiki.apache.org/confluence/display/AVRO/Experimental+features+in+Avro >>> >>> And there is already a page for Avro Enhancement Proposals that look >>> largely out of date: >>> >> https://cwiki.apache.org/confluence/display/AVRO/Avro+Enhancement+Proposals >>> >>> Moving some of the extras to a separate repo brings many of the same >>> problems as versioning artifacts separately (nobody wants to deal with >>> a compatibility matrix). I'm definitely not against it, but I'm not >>> sure how it would improve the situation. >>> >>> There's a fine line between being extremely stable and being >>> paralyzed! I would be enthusiastic about any process changes that >>> would help us encourage and adopt new features (and fixes) more >>> quickly. >>> >>> All my best, Ryan >>> >>> >>> On Sun, Apr 26, 2020 at 11:18 AM Driesprong, Fokko <fo...@driesprong.frl> >> wrote: >>>> >>>> Hi Andy, >>>> >>>> Thanks for reaching out. Sorry for not being so active in the community >>>> lately. >>>> >>>> Since Avro 1.8.2 there has been some activity on the repository again, >>>> fixing stuff like security issues and migrating to later versions of >> Java. >>>> Avro has been around for 10 years now, and I would like to keep (some) >>>> backward compatibility to make sure that people are still going to use >> it >>>> for another 10 years :) In the past, the idea was to keep the format >>>> backward compatibility, this excludes the Java API to. So we did some >>>> changes to the API, such as removing Jackson from the public API and >>>> aggressively migrating from Joda Time to Java JSR-310. This caused a >> lot of >>>> issues because Avro is deeply nested in a lot of projects. For >> example, it >>>> is a huge task to update Avro in Hive or Hadoop. Therefore we believe >> that >>>> backward compatibility is very important. >>>> >>>> And I agree that we should mainly focus on the Avro spec itself, and >> not >>>> too much on File I/O and Network etc :) However, if we decide to break >> an >>>> API, we should do it for a good reason. >>>> >>>> Cheers, Fokko >>>> >>>> Op wo 22 apr. 2020 om 16:09 schreef Andy Le <anhl...@gmail.com>: >>>> >>>>> Hi guys, >>>>> >>>>> I'm new to this vibrant open source community. My story with Avro >> can be >>>>> found here [1] >>>>> >>>>> While implementing the feature, I got stuck and had various >> discussions >>>>> with Dough Cutting, Fokko Driesprong.... You may see here [2] >>>>> >>>>> Here my (bias) observations about our current Avro 1.9.x: >>>>> >>>>> - Some improvements can't be made due to fear of backward >>>>> incompatibilities. For example: specifications about named Union. >>>>> >>>>> - If `Apache Avro™ is a data serialization system.` then the >> repository >>>>> `apache/avro` should solely focus on (de)serialization, right? >> Currently >>>>> our repository contains many nice-to-have-but-not-critical things >> like: >>>>> File I/O, Network I/O.... >>>>> >>>>> IMHO, I think: >>>>> >>>>> - We should publicly gather RFCs for Avro 2.x >>>>> >>>>> - We should move such nice things out of Avro 2.x (may be to other >>>>> dedicated repositories) >>>>> >>>>> What do you think about my suggestions. Pls kindly let me know. >>>>> >>>>> Thank you & be strong. >>>>> >>>>> [1] My fork: https://github.com/anhldbk/avro-fork#why-this-fork >>>>> [2] My opened issue: >>>>> >> https://issues.apache.org/jira/browse/AVRO-2808?jql=reporter%3Danhldbk%20AND%20resolution%20is%20EMPTY >>>>> >>>>> >>>>> >> > > > -- > Ryan Blue > Software Engineer > Netflix