Thanks Jacques and Ryan for the insights!
Im going to try something based on RecordConsumer model.
MK

On Thu, Apr 9, 2015 at 12:57 PM, Ryan Blue <[email protected]> wrote:

> Excellent point about unions at too high of a level, which I never thought
> about. The best practice is definitely to add the new column with a default
> instead of versioning the entire record! I wonder if there is something we
> can do about that.
>
> rb
>
> On 04/08/2015 06:03 PM, Jacques Nadeau wrote:
>
>> I agree with what Ryan said.  In terms of effort of implementation, using
>> the existing object models are great.
>>
>> However, as you try to tune your application,  you may find suboptimal
>> transformation patterns to the physical format.  This is always a possible
>> risk when working through an abstraction.  The example I've seen
>> previously
>> is that people might create a union at a level higher than is necessary.
>> For example, imagine
>>
>> old: {
>>    first:string
>>    last:string
>> }
>>
>> new: {
>>    first:string
>>    last:string
>>    twitter_handle:string
>> }
>>
>> People are inclined to union (old,new).  Last I checked, the default Avro
>> behavior in this situation would be to create five columns: old_first,
>> old_last, new_first, and new_last (names are actually nested as group0.x,
>> group1.x or something similar).  Depending on what is being done, this can
>> be suboptimal as a logical query of "select table.first from table" now
>> has
>> to read two columns, manage two possibly different encoding schemes, etc.
>> This will be even more impactful as we implement things like indices in
>> the
>> physical layer.
>>
>> In short, if you are using an abstraction, be aware that the physical
>> layout may not be as optimal as it would have been if you had hand-tuned
>> the schema with your particular application in mind.  The flip-side is you
>> save time and aggravation in implementation.
>>
>> Make sense?
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Reply via email to