Enis, Thanks for the pointers. Are the dirty bits only used by Map/Reduce or for general persistence in terms of application logic? I guess in the latter case its ok for them to be transient, and if the only other use case is in Map/Reduce, something could maybe be done in the input and output formats to avoid fiddling with the pseudo-official Avro API's.
On Fri, May 18, 2012 at 2:05 PM, Enis Söztutar <[email protected]> wrote: > Hi Ed, > > Good to see some interest in pushing things forward. > > As the javadoc says, FakeResolvingDecoder is pretty much a big dirty hack > to work around Avro's internals, but as you pointed out much has changed in > Avro, so we may have to rethink those parts. > > We need the dirty bits in the serialization for mapreduce, but not for the > final serialization at the store (hbase, cassandra, etc). The reasoning is > that during map - reduce phases, we may mutate the objects in map, which is > serialized and deserialized from reduce and used there. > > I have not spend any time on the change in avro for some time, so cannot > comment on what would be the cleanest way to go. Either way, we can augment > the schema, or hijack DatumReaders/Writers. If you are willing to work on > this, I think it is best to find out what is public / stable in avro, and > extend those parts. When we first wrote these parts, avro was very young, > and it was not clear what was the public API. Maybe consulting avro folks, > and pushing for changes / hooks in avro so that things don't break is a > good option. > > I don't believe we need anything other that dirty bits to be augmented. If > you are planning to work on this, feel free to reach out. > > Cheers, > Enis > > On Fri, May 18, 2012 at 8:45 AM, Ed Kohlwey <[email protected]> wrote: > >> Hi, >> I'm working on updating Gora to Avro 1.7- I've mostly figured out what >> I need to do, except whats happening in FakeResolvingDecoder.java. >> >> Avro now uses a nice factory system which essentially prevents you >> from extending some of these core classes, so a different workaround >> will have to do. >> >> It looks like this is basically a way to work around having dirty bits >> added to the Avro protocol. Is that right? Has there been any >> historical discussion of doing things differently like augmenting >> record schemas to include dirty bits, or making the dirty bits a >> transient member of a parent class? Or am I off base here? >> >> Is there any augmenting done other than dirty bits? >>

