I wrote GORA-142-v3.patch that supports several new types of serialization for gora-cassandra. https://issues.apache.org/jira/browse/GORA-142
Since I am not familiar with other implementation such as gora-hbase, I'd like to hear your opinions on serialization spec, especially for variable length array. Gora uses Avro for schema definition, but I noticed that gora-cassandra uses its own serialization based Hector. For instance, serialization of integer is totally different between Avro (zig-zag) and gora-cassandra (Hector). Considering Cassandra's pre-defined validation classes and comparators, I think Hector's serialization is better than Avro's one at gora-cassandra, so, my implementation of GORA-142 patch is based on Hector's serializers. For ARRAY support, I implemented GORA-138 patch first with Super CF in the same way as RECORD or MAP. https://issues.apache.org/jira/browse/GORA-138 As Enis mentioned at GORA-138, we may want another implementation with single column for reasonably short arrays, so GORA-142 patch supports ARRAY with single column implementation. For fixed length array, single column can store multiple elements just adding them sequentially. However, for variable length array such as STRING or BYTES, it is impossible to retrieve each value if just values are stored sequentially, so GORA-142 patch implementation contains the size of element as INTEGER before each actual value. For instance, ["ABCDE", "abc", "1234"] is stored as 00 00 00 05 41 42 43 44 45 00 00 00 03 61 62 63 00 00 00 04 31 32 33 34 If there is no obligation, I will commit GORA-142 patch with above serialization spec later once it is ready. Regards, -Kaz

