Scott, Thanks for your response.
I completely agree that your use cases are valuable. I think we also agree that the right place to layer this is as a separate "translation" or "transformation" library. I think madness lies in pushing those transformations into schema JSON; that's not what you're proposing, however, so all is good. -- Philip On Thu, Feb 10, 2011 at 10:28 AM, Scott Carey <[email protected]>wrote: > > > On 2/4/11 1:16 PM, "Philip Zeyliger" <[email protected]> wrote: > > >On Fri, Feb 4, 2011 at 10:02 AM, Scott Carey > ><[email protected]>wrote: > > > >> I have been thinking about more advanced type promotion in Avro after > >> facing more complicated schema evolution issues. > > > > > >My two cents: > > > >This way lies madness. Avro (and PB and Thrift) give you some basic tools > >to evolve an API without doing much extra code. At some point, you end up > >forking and creating an APIv2, and eventually deprecate APIv1. If you try > >to make that magical, you'll end up building a programming language. > > I agree that protocol API versus AVIv2 is an example where exotic > conversions don't make a lot of sense. The schemas in a protocol API > isn't persisted long term, it is only on the wire. > > My use cases are in long term persisted file data, where schema evolution > spans a much longer time window (forever unless I can re-write all data). > Having File format v1 not being compatible with file format v2 is a lot > harder to swallow than API v2 not being compatible with API v2. > > I have another use case in mind as well. Schema transformation is a > common need for interoperation with other frameworks. Cascading doesn't > support nested records (or it didn't last I looked), so a Cascading Tap > has to either flatten them or not support them. Pig doesn't support > unions, so they are either not supported, or manipulated into non-union > structures. Schema transformation is a common use case when integrating > Avro with pre-existing systems. > When working on Pig and Hive adapter prototypes, there turned out to be a > lot of overlap and repeated work -- and its almost all in schema > transformation (flattening, unions, etc), classification (recursive?), and > translation. > If there was a general helper library for this sort of work, then the > remaining adapter would be rather small and not require so much Avro > domain knowledge. > > > > > >By all means define a language that converts from one Avro record into > >another. An Avro expression language would be quite useful, actually. > >Putting it in the core, however, strikes me as feature creep. > > Core should definitely remain simple. Anything like this should be an > optional library. Support for each transformation should be optional as > well -- many languages might have string <> int, while only a couple have > union branch materialization. > > The more complicated transforms are mostly useful for frameworks that want > to use Avro in a way that can interop with other frameworks using avro. > > The initial reaction to the above statement is probably, "If they are both > using Avro already, shouldn't they automatically be able to share data?" > The answer is no. They aren't using Avro as their internal schema system. > They are _translating_ between their internal schema system and Avro, > potentially applying various transformation rules. So, for the lowest > common denominator supported schemas, it works fine, anything more > complicated and it won't. This is not a fault of Avro, it is the nature > of compatibility between two non-Avro schema systems. > Hive supports Maps with integers as keys. Pig does not. These can be > made to interop through Avro if both systems share their schema > translation techniques, but not otherwise. > > > > >-- Philip > >
