Hi,
2013/2/10 Alfonso Nishikawa <[email protected]>: > Hi Renato, > >> So what you are proposing is to store and extra index at the beginning >> of the actual value? or does HBase do this automatically? What about >> if bytes were being written? couldn't some type of corruption happen >> and make this unusable? > > The extra byte at the beginning of the actual value is part of Avro :) > Gora-hbase must adhere to avro specs, so that is really the union > sourcecode update. > In the case of bytes, first is encoded a 'long' with the length of the > bytes, followed with the bytes data. > I got all from Avro Specs at [2]. Thanks! I overlooked the binary encoding specification ;) The problem with Cassandra is that not everything is written down as bytes (well it probably is but deeper down in the code). Please look at column types [1]. So what would you suggest to do in cases where non-appendable column types are used e.g. BooleanType, UUIDType, and others? I mean in columns storing integers or decimals, I think we could append a single value to determine what type of serializer to use, but I dunno what to do in those other cases. >>> think now is better expressed. >>> If no one think is wrong, I will implement solution-1 and solution-2(this >>> means maybe quite work, so do we maintain it? -I vote yes). >> >> So does your solution have two parts? or are they two separate >> possible solutions? > > There are two potencial different problems (incompatibilities with > legacy data), so we can choose to leave them behind both, only one, or > none. Lewis voted for facing both (same as I), so I guess we will > mainaint data compatibility until version 1.0. This is a part I am not understanding very well. You guys are saying that legacy data is a problem, but why is this a problem if we haven't been supporting Avro Union in the past? This is a new feature, not an upgrade. And for what I am understanding, the second issue was on marking as deprecated the support for Union data types. But then again, if we are able to support Union data types, this would be the first time. Am I understanding things correctly here? Lewis? Alfonso? anyone else? >> You said on another email that HBase could persist Union data types >> directly without having to modify it (did I get that right? or am I >> confusing stuff? ) so implementing this would be just to tell HBase to >> save the union data type but not actually writing this extra byte? I >> wasn't able to find the avro documentation talking about this, could >> you please point me to where this is? > > Sorry, surely my fault because I always express myself wrong. You need > to write that index. Solution 1 [3] avoids writing that index but is > an exception for only null-or-onetype unions. Ok, I see. But what about unions with more than one type? shouldn't we think in solving this once for all? We also have to keep in mind that the same solution might not be applicable to all data stores, but we should be able to provide the same features across all the supported data stores. >>> I had to restore my git server, but in this case not all went right, so now >>> is up again at [1]. >> >> Thanks! and great work documenting this issue! (: Renato M. [1] http://www.datastax.com/docs/1.0/ddl/column_family#about-data-types-comparators-and-validators >> >> Renato M. > > Thank you for your comments and questions! :) > > Best regards, > > Alfonso Nishikawa > > [2] - http://avro.apache.org/docs/current/spec.html#binary_encoding > [3] - https://people.apache.org/~alfonsonishikawa/gora-174.html

