Dear Alfonso/Lewis,
thanks a lot for sharing your thoughts! I have worked a little on the topic
during the last days and would like to share the results with you. Pls accept
my apologies for the long text.
We've been using C* for a while now in different contexts and ultimately aim to
build a more comprehensive solution for handling/analyzing large data sets.
I've discovered Gora while thinking about a C* abstraction layer and was
intrigued by the idea of an ODM that - unlike many ORMs - considers the
characteristics of NoSQL data stores. Also I like the direct link to hadoop.
Currently, we're developing a prototype for processing large amounts of sensor
data. Some of our requirements are:
- utilizing a cluster setup that raises robustness and laniary scales out in
terms of size and performance
- fast processing of materialized views for near-real-time queries on large
data sets
Translated to cassandra, this has a couple of implications:
- the solution needs to distribute data and load evenly in the cluster. This is
directly related to the partitioner and only random partitioners guarantee it
(out of the box). OrderPreservingPartitioners tend to fill up the nodes
differently and create hot spots (bottle necks). Random partitioners however do
not allow to do range scans, as the rows are not stored in any meaningful order.
- in any case, our need for fast range queries, requires to utilize C*'s
natural ordering of columns. A row is C*'s approach to partitioning. It is
guaranteed to be located on a single node. Mutators are guaranteed to act
atomically and in isolation. Range queries are fast. That is why you normally
want to fill up rows nearly without restriction. "Skinny rows", with just a
couple of small columns, create overhead that is often not justified.
- when storing large data sets in a partition, super columns are another
problem. C* needs to de-serialize the whole set of sub columns even if you just
query for some of them. When you have too much sub columns this obviously
breaks performance.
- certainly we rely on C* replications and consistency features.
You will have noticed that the current design of the Gora cassandra module is
not really in conformance with the above points. Still I think Gora did
fantastic work in providing basic concepts and infrastructure for a big data
abstraction layer. Moreover, it should be generally possible to extend the
design in a way that satisfies our needs.
A possibility is to use C* compound primary keys with composite partition
sub-keys. Very short, compound primary keys allow to define multiple
hierarchical parts of a column name within a single row. Higher level parts can
be used as index for lower level ones. Still, (possibly different) comparators
are defined for all levels of names and range scans over the parts are
possible. Such composite names can be used for indices and to substitute super
columns. Composite partition keys allow to control the separation of data into
multiple rows (partitions) if the data set referenced by the partition key is
too large for a single row (e.g. if it doesn't fit on a node)
The idea to exploit these concepts in Gora builds on adding an additional key
mapping part to the gora-cassandra-mapping. An avro class might define a
primary key element. The key is represented by another avro class. The mapping
defines which fields of the avro key class hold the data for which parts of the
C* compound primary key. Additionally, the field names of the avro value class
are being appended to the composite column name. Nested complex types that have
formerly been mapped to sub columns are handled accordingly by appending their
name and subKeys as column names. As C* composite column names can be defined
dynamically, also an arbitrary level of nesting complex types would be possible.
An open question for me is, if this concept is also viable considering other
data stores and if it would make sense to add something along this lines to the
Gora core. At least for Cassandra it seems to do the trick.
As a POC I am implementing this on top of the current Gora HEAD (just the
cassandra module). Thanks to the good Gora infrastructure, I've finished a
first version of the data creation/update part and can give you an example. It
shows a simple avro value class for sensor readings. The sensor key contains
the sensor id as well as several hierarchies of durations for aggregation and
indexing. Here's the example avro spec:
[ {
"type" : "record",
"name" : "SensorKey",
"namespace" : "com.foo.generated",
"fields" : [ {
"name" : "meterId",
"type" : "string"
}, {
"name" : "year",
"type" : "int"
}, {
"name" : "month",
"type" : "int"
}, {
"name" : "week",
"type" : "int"
}, {
"name" : "date",
"type" : "long"
} ]
}, {
"type" : "record",
"name" : "SensorReading",
"namespace" : "com.foo.generated",
"fields" : [ {
"name" : "reading",
"type" : "double"
} ]
} ]
The corresponding gora-cassandra-mapping is shown below. The result of this is
that all meter reading objects of a single year will go into one partition
(row) and can be retrieved by date ranges. Additionally the CF holds
aggregations for months and weeks. Additionally, the mapping specifies data
replication for the keyspace.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Gora Mapping file for Cassandra Backend -->
<gora-orm>
<keyspace name="SensorData" cluster="Test Cluster" host="localhost:9160"
replicationFactor="1"
replicationStrategy="org.apache.cassandra.locator.SimpleStrategy">
<family name="meterReading" />
</keyspace>
<class
name="com.foo.generated.SensorReading"
keyspace="SensorData"
keyClass="com.foo.generated.MeterKey">
<field name="reading" family="sensorReading" qualifier="reading" />
</class>
<primaryKey
name="com.foo.generated.SensorKey"
compactStorage="false">
<partitionKey>
<field name="sensorId" type="UTF8Type" />
<field name="year" type="IntegerType" />
</partitionKey>
<clusterKey>
<field name="month" type="IntegerType" />
<field name="week" type="IntegerType" />
<field name="date" type="LongType" />
</clusterKey>
</primaryKey>
</gora-orm>
Here is some sample code to fill the CF:
// write 1000 readings
Random rand = new Random();
for (int i = 0; i < 1000; i++) {
// sample reading
SensorReading newReading = new SensorReading();
newReading.setReading(rand.nextDouble());
// sample key
SensorKey key = new SensorKey();
key.setMeterId(new Utf8("foo"));
Date date = new Date();
GregorianCalendar gc = new GregorianCalendar();
gc.setTime(date);
key.setDate(date.getTime());
key.setMonth(gc.get(GregorianCalendar.MONTH));
key.setYear(gc.get(GregorianCalendar.YEAR));
key.setWeek(gc.get(GregorianCalendar.WEEK_OF_MONTH));
// put in store
dataStore.put(key, newReading);
dataStore.flush();
}
And this is how the resulting meterReading CF looks like in cassandra-cli (note
that it is just one row)
RowKey: s@foo:i@2013
=> (name=i@6:i@3:l@1374252790759:s@reading, value=3fe7a9b7da088b79,
timestamp=1374252790781000)
=> (name=i@6:i@3:l@1374252790806:s@reading, value=3fefe94fe4fe8239,
timestamp=1374252790806000)
=> (name=i@6:i@3:l@1374252790807:s@reading, value=3fdc467469b1c6a8,
timestamp=1374252790807000)
=> (name=i@6:i@3:l@1374252790808:s@reading, value=3fe343b74381da8c,
timestamp=1374252790808000)
=> (name=i@6:i@3:l@1374252790809:s@reading, value=3fd827df8f477682,
timestamp=1374252790809000)
=> (name=i@6:i@3:l@1374252790810:s@reading, value=3fe58823ebc94fca,
timestamp=1374252790811000)
=> (name=i@6:i@3:l@1374252790812:s@reading, value=3fd1745235382594,
timestamp=1374252790812000)
=> (name=i@6:i@3:l@1374252790813:s@reading, value=3fba982847b15530,
timestamp=1374252790813000)
=> (name=i@6:i@3:l@1374252790814:s@reading, value=3feb3a6d9c672c62,
timestamp=1374252790814000)
=> (name=i@6:i@3:l@1374252790815:s@reading, value=3fe6c8c7926df7d1,
timestamp=1374252790815000)
=> (name=i@6:i@3:l@1374252790816:s@reading, value=3fd5e70878a98e50,
timestamp=1374252790816000)
...
I'm trying to finish the Gora C* store this month. I'm happy to share the patch
with you if you are interested.
best regards,
christian
-----Ursprüngliche Nachricht-----
Von: Lewis John Mcgibbney [mailto:[email protected]]
Gesendet: Freitag, 19. Juli 2013 01:42
An: <[email protected]>
Betreff: Re: Gora Cassandra module design rationale
Hi Christian,
On Thu, Jul 18, 2013 at 3:12 PM, <[email protected]> wrote:
> in the course of evaluating Gora, I'm looking for information on the
> design rationale behind the Gora Cassandra module.
>
OK here we go :0)
>
> In particular, I try to find information on the following (debatable?)
> points:
>
I like your choice of words... ;)
>
> - Gora keys are mapped to C* partition keys only
>
Yes this is true. You can see this in CassandraStore#addSubColumns &
#addSuperColumns where we follow the Cassandra logic CF data is partitioned
across nodes based on row Key. This would actually be very nice Javadoc for
such methods (even though they are private), however we should also annotate
CassandraStore#execute(query) or (partiionQuery) as the same principle applies
for this method as well.
> - Gora requires the C* ByteOrderedPartitioner
>
Mmmm... Kaz recently changed the Embedded Cassandra server in the tests to use
ByteOrderedPartitioner as testQueryWebPageQueryEmptyResults (and some other
tests) were failing
https://issues.apache.org/jira/browse/GORA-157
> - Comparators are always BYTESTYPE
>
> - Gora makes extensive use of super column families
>
As Alfonso said, yes we do. Right now this is the data modelling approach we
have and maintain. We (Renato and myself) recently discussed @C*Summit that
this is going to need to change.
When we cross this bridge it will be a (most likely non-backwards
compatable) re-write of the bulk of the C*Module in Gora.
I assume you guys are investing in C* for medium/long term? I can tell you for
sure that we will be changing Gora... however as Alfonso said we will make best
efforts to support backwards compat.
> - The implementation has a hard-coded replication factor of 1
>
Right now we use Hector client. Roland and myself have a number of issues open
to make better use of features provided to us by Hector. Please see below
https://issues.apache.org/jira/browse/GORA-214
https://issues.apache.org/jira/browse/GORA-98
https://issues.apache.org/jira/browse/GORA-209
https://issues.apache.org/jira/browse/GORA-215
https://issues.apache.org/jira/browse/GORA-167
wow I didn't realize we had sop many open!!! Yikes.
>
> - Gora doesn't seem to take advantage of native column ordering in C*
>
As per Alfonso's quote
- Gora doesn't seem to utilize C* compound primary keys
>
As per Alfonso's quote
- Gora doesn't seem to allow the use of per-operation consistency levels
>
> Please see issue above.
I hope this has answered some of your queries. Apologies for taking a while to
get back. Been breaking my back @work ;) Thanks Lewis
...
SEEBURGER AG Vorstand/Seeburger Executive Board:
Sitz der Gesellschaft/Registered Office: Bernd Seeburger, Axel
Haas, Michael Kleeberg
Edisonstr. 1
D-75015 Bretten Vorsitzender des Aufsichtsrats/Chairperson of the
Seeburger Supervisory Board:
Tel.: 07252 / 96 - 0 Dr. Franz Scherer
Fax: 07252 / 96 - 2222
Internet: http://www.seeburger.de Registergericht/Commercial
Register:
e-mail: [email protected] HRB 240708 Mannheim
Dieses E-Mail ist nur für den Empfänger bestimmt, an den es gerichtet ist und
kann vertrauliches bzw. unter das Berufsgeheimnis fallendes Material enthalten.
Jegliche darin enthaltene Ansicht oder Meinungsäußerung ist die des Autors und
stellt nicht notwendigerweise die Ansicht oder Meinung der SEEBURGER AG dar.
Sind Sie nicht der Empfänger, so haben Sie diese E-Mail irrtümlich erhalten und
jegliche Verwendung, Veröffentlichung, Weiterleitung, Abschrift oder jeglicher
Druck dieser E-Mail ist strengstens untersagt. Weder die SEEBURGER AG noch der
Absender ( Zirpins. Christian ) übernehmen die Haftung für Viren; es obliegt
Ihrer Verantwortung, die E-Mail und deren Anhänge auf Viren zu prüfen.
The present email addresses only the addressee which it targets and may contain
confidential material that may be protected by the professional secret. The
opinions reflected herein are not necessarily the one of the SEEBURGER AG. If
you are not the addressee, you have accidentally got this email and are not
enabled to use, publish, forward, copy or print it in any way. Neither
SEEBURGER AG , nor the sender (Zirpins. Christian) are liable for viruses,
being your own responsibility to check this email and its attachments for this
purpose.