[ 
https://issues.apache.org/jira/browse/GORA-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated GORA-267:
--------------------------------------

    Description: 
The extension allows to define primary keys that are represented by avro 
classes. A mapping specifies how fields of the key class are mapped to the 
components of composite partition keys and composite column names. This gives 
users more control with respect to the distribution of data into Cassandra 
database structures. It is now possible to store data in wide rows with custom 
indexes that allow for fast range scans on a single node. Also there is no more 
need for an order-preserving partitioner that is likely to compromise data 
distribution in the Cassandra cluster.

The extension allows to define primary keys that are represented by avro 
classes. A mapping specifies how fields of the key class are mapped to the 
components of composite partition keys and composite column names. This gives 
users more control with respect to the distribution of data into Cassandra 
database structures. It is now possible to store data in wide rows with custom 
indexes that allow for fast range scans on a single node. Also there is no more 
need for an order-preserving partitioner that is likely to compromise data 
distribution in the Cassandra cluster.

In essence, composite primary keys with identical partition parts will be 
written in the same Cassandra row (which is essentially a partition). Within 
the same row entities are stored in lexical order by their cluster key 
components. Avro field names are appended as the last component of the 
composite column name. The current implementation does not substitute super 
columns. Thus, complex avro fields are still mapped to super columns. Super 
column families use the same composite primary keys as simple column families. 
As Gora always fully loads nested complex types, the use of super column 
families is not really a problem. Yet, super columns could be substituted by 
another level of column name components below the field qualifiers in future 
work. It would also be possible to rethink the decomposition of complex nested 
types beyond the first level.

The implementation uses the concept of Gora partitionQueries in order to 
decompose row scanning queries into a sets of queries that each operate on a 
single row. However, such a decomposition is not always possible and real range 
scans are limited to wide rows (partitions).

The implementation is fully backward compatible. Simple key classes can still 
be used and row scans are still possible with an order-preserving partitioner. 
The current junit tests are all passed. Furthermore, I have added an example 
and some unit tests to demonstrate the use of composite primary keys for time 
series data.

As mentioned earlier, we are happy to share this extension. I've created a jira 
issue for it (GORA-267) and will provide the implementation on GitHub 
(https://github.com/zirpins/gora/tree/GORA-267).

Regards,
Christian


  was:
The extension allows to define primary keys that are represented by avro 
classes. A mapping specifies how fields of the key class are mapped to the 
components of composite partition keys and composite column names. This gives 
users more control with respect to the distribution of data into Cassandra 
database structures. It is now possible to store data in wide rows with custom 
indexes that allow for fast range scans on a single node. Also there is no more 
need for an order-preserving partitioner that is likely to compromise data 
distribution in the Cassandra cluster.



> Cassandra composite primary key support
> ---------------------------------------
>
>                 Key: GORA-267
>                 URL: https://issues.apache.org/jira/browse/GORA-267
>             Project: Apache Gora
>          Issue Type: Improvement
>          Components: gora-cassandra
>            Reporter: [email protected]
>              Labels: features
>             Fix For: 0.4
>
>         Attachments: gora-267.diff
>
>
> The extension allows to define primary keys that are represented by avro 
> classes. A mapping specifies how fields of the key class are mapped to the 
> components of composite partition keys and composite column names. This gives 
> users more control with respect to the distribution of data into Cassandra 
> database structures. It is now possible to store data in wide rows with 
> custom indexes that allow for fast range scans on a single node. Also there 
> is no more need for an order-preserving partitioner that is likely to 
> compromise data distribution in the Cassandra cluster.
> The extension allows to define primary keys that are represented by avro 
> classes. A mapping specifies how fields of the key class are mapped to the 
> components of composite partition keys and composite column names. This gives 
> users more control with respect to the distribution of data into Cassandra 
> database structures. It is now possible to store data in wide rows with 
> custom indexes that allow for fast range scans on a single node. Also there 
> is no more need for an order-preserving partitioner that is likely to 
> compromise data distribution in the Cassandra cluster.
> In essence, composite primary keys with identical partition parts will be 
> written in the same Cassandra row (which is essentially a partition). Within 
> the same row entities are stored in lexical order by their cluster key 
> components. Avro field names are appended as the last component of the 
> composite column name. The current implementation does not substitute super 
> columns. Thus, complex avro fields are still mapped to super columns. Super 
> column families use the same composite primary keys as simple column 
> families. As Gora always fully loads nested complex types, the use of super 
> column families is not really a problem. Yet, super columns could be 
> substituted by another level of column name components below the field 
> qualifiers in future work. It would also be possible to rethink the 
> decomposition of complex nested types beyond the first level.
> The implementation uses the concept of Gora partitionQueries in order to 
> decompose row scanning queries into a sets of queries that each operate on a 
> single row. However, such a decomposition is not always possible and real 
> range scans are limited to wide rows (partitions).
> The implementation is fully backward compatible. Simple key classes can still 
> be used and row scans are still possible with an order-preserving 
> partitioner. The current junit tests are all passed. Furthermore, I have 
> added an example and some unit tests to demonstrate the use of composite 
> primary keys for time series data.
> As mentioned earlier, we are happy to share this extension. I've created a 
> jira issue for it (GORA-267) and will provide the implementation on GitHub 
> (https://github.com/zirpins/gora/tree/GORA-267).
> Regards,
> Christian



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to