Enis,
Thanks for the pointers. Are the dirty bits only used by Map/Reduce or
for general persistence in terms of application logic? I guess in the
latter case its ok for them to be transient, and if the only other use
case is in Map/Reduce, something could maybe be done in the input and
output formats to avoid fiddling with the pseudo-official Avro API's.

On Fri, May 18, 2012 at 2:05 PM, Enis Söztutar <[email protected]> wrote:
> Hi Ed,
>
> Good to see some interest in pushing things forward.
>
> As the javadoc says, FakeResolvingDecoder is pretty much a big dirty hack
> to work around Avro's internals, but as you pointed out much has changed in
> Avro, so we may have to rethink those parts.
>
> We need the dirty bits in the serialization for mapreduce, but not for the
> final serialization at the store (hbase, cassandra, etc). The reasoning is
> that during map - reduce phases, we may mutate the objects in map, which is
> serialized and deserialized from reduce and used there.
>
> I have not spend any time on the change in avro for some time, so cannot
> comment on what would be the cleanest way to go. Either way, we can augment
> the schema, or hijack DatumReaders/Writers. If you are willing to work on
> this, I think it is best to find out what is public / stable in avro, and
> extend those parts. When we first wrote these parts, avro was very young,
> and it was not clear what was the public API. Maybe consulting avro folks,
> and pushing for changes / hooks in avro so that things don't break is a
> good option.
>
> I don't believe we need anything other that dirty bits to be augmented. If
> you are planning to work on this, feel free to reach out.
>
> Cheers,
> Enis
>
> On Fri, May 18, 2012 at 8:45 AM, Ed Kohlwey <[email protected]> wrote:
>
>> Hi,
>> I'm working on updating Gora to Avro 1.7- I've mostly figured out what
>> I need to do, except whats happening in FakeResolvingDecoder.java.
>>
>> Avro now uses a nice factory system which essentially prevents you
>> from extending some of these core classes, so a different workaround
>> will have to do.
>>
>> It looks like this is basically a way to work around having dirty bits
>> added to the Avro protocol. Is that right? Has there been any
>> historical discussion of doing things differently like augmenting
>> record schemas to include dirty bits, or making the dirty bits a
>> transient member of a parent class? Or am I off base here?
>>
>> Is there any augmenting done other than dirty bits?
>>

Reply via email to