Forgot about your last question. I suggest to create a sub-task. Can you create one? If not, I will create it for you (Menu "More Actions > Create sub-task").
Best regards, Alfonso Nishikawa 2013/2/6 Alfonso Nishikawa <[email protected]> > Hi Renato, > > I saw in the code that Cassandra has its own serializers. Can you give us > a small summary about how does it works and what affects before your > modifications? This will help understanding your aproaches. > > Does Cassandra have some penalties for the new column? In HBase that > approach is not necessary since the union-index gets serialized (by Avro) > and stored before the proper data (I know you know that :) just > remembering). > > About generating classes, there's no need to modify the compiler (check if > you really need to modify it). Taking into account that an union can't have > 2 same types (avro specs): > - When you are writing, you can implement the approach of avro show in > GenericData#resolveUnion():333 [0] (avro 1.3.3) called from [1], where > iterates on union types until matches the type of the data being written. > - When reading, you know the index. The aproach of Avro is in [2]. > > I suggest not modifying (if possible) because for HBase it gets a > duplicated state, where one will be ignored and becomes noise in the > structures. > My oppinion, of course :) > > Thanks for all!! > > Best regards, > > Alfonso Nishikawa > > [0] - > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericData.java?av=f#333 > [1] - GenericDatumWriter#write():59 - > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericDatumWriter.java?av=f#59 > [2] - GenericDatumReader#read():84 - > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericDatumReader.java?av=f#77 > > > > 2013/2/6 Renato Marroquín Mogrovejo <[email protected]> > >> Hi all, >> >> This is a really long overdue email. Finally I got the time to get >> around to this while I am on holidays (: >> >> I've made some changes to the Gora-Cassandra to support AvroUnion data >> types even though Cassandra doesn't rely on Avro for serializing data. >> So what it has been done is a workaround to save specialized data >> types e.g. UNIONS. I faced the same problems and doubts that Alfonso >> described, and Alfonso, your post was very illustrative mate ;) >> >> I will just explain the general approach so the changes can be >> understood and the changes themselves can be found inside the code, or >> reply to this email to talk about it. >> >> ** For storing Union data ** >> We are creating a new column only on at the moment in which we are >> flushing the data into the data store. This generated column will >> store the index of the schema used within the Union data type. >> >> ** For retrieving Union data ** >> Retrieving the data directly from Cassandra, Gora can make it by >> itself. The problem here was to determine which serializer to use >> while getting this data back. So the first thing to do is to get the >> value stored within the generated column, and use that value to select >> the appropriate serializer. After that is just using what Gora has in >> it. >> >> ** For generating classes ** >> I am not particularly happy with the changes I've made here. I changed >> GoraCompiler directly to create the extra field to store the selected >> schema of the Union data type. I tried to only add a new field to the >> schema before compiling and then let the compiler work but I kept on >> getting a lock exception from Avro which didn't let me get through >> this change as I wanted. If anybody could help me out on how to do it, >> then give me a shout! :) >> >> I didn't know where to upload this patch or to Gora-174 because it >> addresses an issues caused by it, or to create a new issue to handle >> the Avro Union per data store. >> Thanks for reading until the end! >> >> >> Renato M. >> > > > > -- > "Drinking bloody marys all night will make you feel like a corpse in the > morning." > -- "Drinking bloody marys all night will make you feel like a corpse in the morning."

