I wrote GORA-142-v3.patch that supports several new types of
serialization for gora-cassandra.
https://issues.apache.org/jira/browse/GORA-142

Since I am not familiar with other implementation such as gora-hbase,
I'd like to hear your opinions on serialization spec, especially for
variable length array.

Gora uses Avro for schema definition,
but I noticed that gora-cassandra uses its own serialization based Hector.
For instance, serialization of integer is totally different between Avro
(zig-zag) and gora-cassandra (Hector).
Considering Cassandra's pre-defined validation classes and comparators,
I think Hector's serialization is better than Avro's one at gora-cassandra,
so, my implementation of GORA-142 patch is based on Hector's serializers.

For ARRAY support, I implemented GORA-138 patch first with Super CF in
the same way as RECORD or MAP.
https://issues.apache.org/jira/browse/GORA-138
As Enis mentioned at GORA-138, we may want another implementation with
single column for reasonably short arrays,
so GORA-142 patch supports ARRAY with single column implementation.

For fixed length array, single column can store multiple elements just
adding them sequentially.
However, for variable length array such as STRING or BYTES,
it is impossible to retrieve each value if just values are stored
sequentially,
so GORA-142 patch implementation contains the size of element as INTEGER
before each actual value.
For instance, ["ABCDE", "abc", "1234"] is stored as
00 00 00 05 41 42 43 44 45 00 00 00 03 61 62 63 00 00 00 04 31 32 33 34

If there is no obligation, I will commit GORA-142 patch with above
serialization spec later once it is ready.

Regards,
-Kaz

Reply via email to