[
https://issues.apache.org/jira/browse/GORA-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated GORA-267:
--------------------------------------
Description:
The extension allows to define primary keys that are represented by avro
classes. A mapping specifies how fields of the key class are mapped to the
components of composite partition keys and composite column names. This gives
users more control with respect to the distribution of data into Cassandra
database structures. It is now possible to store data in wide rows with custom
indexes that allow for fast range scans on a single node. Also there is no more
need for an order-preserving partitioner that is likely to compromise data
distribution in the Cassandra cluster.
The extension allows to define primary keys that are represented by avro
classes. A mapping specifies how fields of the key class are mapped to the
components of composite partition keys and composite column names. This gives
users more control with respect to the distribution of data into Cassandra
database structures. It is now possible to store data in wide rows with custom
indexes that allow for fast range scans on a single node. Also there is no more
need for an order-preserving partitioner that is likely to compromise data
distribution in the Cassandra cluster.
In essence, composite primary keys with identical partition parts will be
written in the same Cassandra row (which is essentially a partition). Within
the same row entities are stored in lexical order by their cluster key
components. Avro field names are appended as the last component of the
composite column name. The current implementation does not substitute super
columns. Thus, complex avro fields are still mapped to super columns. Super
column families use the same composite primary keys as simple column families.
As Gora always fully loads nested complex types, the use of super column
families is not really a problem. Yet, super columns could be substituted by
another level of column name components below the field qualifiers in future
work. It would also be possible to rethink the decomposition of complex nested
types beyond the first level.
The implementation uses the concept of Gora partitionQueries in order to
decompose row scanning queries into a sets of queries that each operate on a
single row. However, such a decomposition is not always possible and real range
scans are limited to wide rows (partitions).
The implementation is fully backward compatible. Simple key classes can still
be used and row scans are still possible with an order-preserving partitioner.
The current junit tests are all passed. Furthermore, I have added an example
and some unit tests to demonstrate the use of composite primary keys for time
series data.
As mentioned earlier, we are happy to share this extension. I've created a jira
issue for it (GORA-267) and will provide the implementation on GitHub
(https://github.com/zirpins/gora/tree/GORA-267).
Regards,
Christian
was:
The extension allows to define primary keys that are represented by avro
classes. A mapping specifies how fields of the key class are mapped to the
components of composite partition keys and composite column names. This gives
users more control with respect to the distribution of data into Cassandra
database structures. It is now possible to store data in wide rows with custom
indexes that allow for fast range scans on a single node. Also there is no more
need for an order-preserving partitioner that is likely to compromise data
distribution in the Cassandra cluster.
> Cassandra composite primary key support
> ---------------------------------------
>
> Key: GORA-267
> URL: https://issues.apache.org/jira/browse/GORA-267
> Project: Apache Gora
> Issue Type: Improvement
> Components: gora-cassandra
> Reporter: [email protected]
> Labels: features
> Fix For: 0.4
>
> Attachments: gora-267.diff
>
>
> The extension allows to define primary keys that are represented by avro
> classes. A mapping specifies how fields of the key class are mapped to the
> components of composite partition keys and composite column names. This gives
> users more control with respect to the distribution of data into Cassandra
> database structures. It is now possible to store data in wide rows with
> custom indexes that allow for fast range scans on a single node. Also there
> is no more need for an order-preserving partitioner that is likely to
> compromise data distribution in the Cassandra cluster.
> The extension allows to define primary keys that are represented by avro
> classes. A mapping specifies how fields of the key class are mapped to the
> components of composite partition keys and composite column names. This gives
> users more control with respect to the distribution of data into Cassandra
> database structures. It is now possible to store data in wide rows with
> custom indexes that allow for fast range scans on a single node. Also there
> is no more need for an order-preserving partitioner that is likely to
> compromise data distribution in the Cassandra cluster.
> In essence, composite primary keys with identical partition parts will be
> written in the same Cassandra row (which is essentially a partition). Within
> the same row entities are stored in lexical order by their cluster key
> components. Avro field names are appended as the last component of the
> composite column name. The current implementation does not substitute super
> columns. Thus, complex avro fields are still mapped to super columns. Super
> column families use the same composite primary keys as simple column
> families. As Gora always fully loads nested complex types, the use of super
> column families is not really a problem. Yet, super columns could be
> substituted by another level of column name components below the field
> qualifiers in future work. It would also be possible to rethink the
> decomposition of complex nested types beyond the first level.
> The implementation uses the concept of Gora partitionQueries in order to
> decompose row scanning queries into a sets of queries that each operate on a
> single row. However, such a decomposition is not always possible and real
> range scans are limited to wide rows (partitions).
> The implementation is fully backward compatible. Simple key classes can still
> be used and row scans are still possible with an order-preserving
> partitioner. The current junit tests are all passed. Furthermore, I have
> added an example and some unit tests to demonstrate the use of composite
> primary keys for time series data.
> As mentioned earlier, we are happy to share this extension. I've created a
> jira issue for it (GORA-267) and will provide the implementation on GitHub
> (https://github.com/zirpins/gora/tree/GORA-267).
> Regards,
> Christian
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)