AW: Gora Cassandra module design rationale

Zirpins. Christian Sat, 20 Jul 2013 06:34:08 -0700

Dear Alfonso/Lewis,

thanks a lot for sharing your thoughts! I have worked a little on the topic 
during the last days and would like to share the results with you. Pls accept 
my apologies for the long text.

We've been using C* for a while now in different contexts and ultimately aim to 
build a more comprehensive solution for handling/analyzing large data sets.

I've discovered Gora while thinking about a C* abstraction layer and was 
intrigued by the idea of an ODM that - unlike many ORMs - considers the 
characteristics of NoSQL data stores. Also I like the direct link to hadoop.

Currently, we're developing a prototype for processing large amounts of sensor 
data. Some of our requirements are:

- utilizing a cluster setup that raises robustness and laniary scales out in 
terms of size and performance
- fast processing of materialized views for near-real-time queries on large 
data sets

Translated to cassandra, this has a couple of implications:

- the solution needs to distribute data and load evenly in the cluster. This is 
directly related to the partitioner and only random partitioners guarantee it 
(out of the box). OrderPreservingPartitioners tend to fill up the nodes 
differently and create hot spots (bottle necks). Random partitioners however do 
not allow to do range scans, as the rows are not stored in any meaningful order.

- in any case, our need for fast range queries, requires to utilize C*'s 
natural ordering of columns. A row is C*'s approach to partitioning. It is 
guaranteed to be located on a single node. Mutators are guaranteed to act 
atomically and in isolation. Range queries are fast. That is why you normally 
want to fill up rows nearly without restriction. "Skinny rows", with just a 
couple of small columns, create overhead that is often not justified.

- when storing large data sets in a partition, super columns are another 
problem. C* needs to de-serialize the whole set of sub columns even if you just 
query for some of them. When you have too much sub columns this obviously 
breaks performance.

- certainly we rely on C* replications and consistency features.

You will have noticed that the current design of the Gora cassandra module is 
not really in conformance with the above points. Still I think Gora did 
fantastic work in providing basic concepts and infrastructure for a big data 
abstraction layer. Moreover, it should be generally possible to extend the 
design in a way that satisfies our needs.

A possibility is to use C* compound primary keys with composite partition 
sub-keys. Very short, compound primary keys allow to define multiple 
hierarchical parts of a column name within a single row. Higher level parts can 
be used as index for lower level ones. Still, (possibly different) comparators 
are defined for all levels of names and range scans over the parts are 
possible. Such composite names can be used for indices and to substitute super 
columns. Composite partition keys allow to control the separation of data into 
multiple rows (partitions) if the data set referenced by the partition key is 
too large for a single row (e.g. if it doesn't fit on a node)

The idea to exploit these concepts in Gora builds on adding an additional key 
mapping part to the gora-cassandra-mapping. An avro class might define a 
primary key element. The key is represented by another avro class. The mapping 
defines which fields of the avro key class hold the data for which parts of the 
C* compound primary key. Additionally, the field names of the avro value class 
are being appended to the composite column name. Nested complex types that have 
formerly been mapped to sub columns are handled accordingly by appending their 
name and subKeys as column names. As C* composite column names can be defined 
dynamically, also an arbitrary level of nesting complex types would be possible.

An open question for me is, if this concept is also viable considering other 
data stores and if it would make sense to add something along this lines to the 
Gora core. At least for Cassandra it seems to do the trick.

As a POC I am implementing this on top of the current Gora HEAD (just the 
cassandra module). Thanks to the good Gora infrastructure, I've finished a 
first version of the data creation/update part and can give you an example. It 
shows a simple avro value class for sensor readings. The sensor key contains 
the sensor id as well as several hierarchies of durations for aggregation and 
indexing. Here's the example avro spec:

[ {
        "type" : "record",
        "name" : "SensorKey",
        "namespace" : "com.foo.generated",
        "fields" : [ {
                "name" : "meterId",
                "type" : "string"
        }, {
                "name" : "year",
                "type" : "int"
        }, {
                "name" : "month",
                "type" : "int"
        }, {
                "name" : "week",
                "type" : "int"
        }, {
                "name" : "date",
                "type" : "long"
        } ]
}, {
        "type" : "record",
        "name" : "SensorReading",
        "namespace" : "com.foo.generated",
        "fields" : [ {
                "name" : "reading",
                "type" : "double"
        } ]
} ]

The corresponding gora-cassandra-mapping is shown below. The result of this is 
that all meter reading objects of a single year will go into one partition 
(row) and can be retrieved by date ranges. Additionally the CF holds 
aggregations for months and weeks. Additionally, the mapping specifies data 
replication for the keyspace.

<?xml version="1.0" encoding="UTF-8"?>

<!-- Gora Mapping file for Cassandra Backend -->
<gora-orm>

    <keyspace name="SensorData" cluster="Test Cluster" host="localhost:9160"
        replicationFactor="1" 
replicationStrategy="org.apache.cassandra.locator.SimpleStrategy">
        <family name="meterReading" />
    </keyspace>

    <class
        name="com.foo.generated.SensorReading"
        keyspace="SensorData"
        keyClass="com.foo.generated.MeterKey">
        <field name="reading" family="sensorReading" qualifier="reading" />
    </class>

    <primaryKey
        name="com.foo.generated.SensorKey"
        compactStorage="false">
        <partitionKey>
            <field name="sensorId" type="UTF8Type" />
            <field name="year" type="IntegerType" />
        </partitionKey>
        <clusterKey>
            <field name="month" type="IntegerType" />
            <field name="week" type="IntegerType" />
            <field name="date" type="LongType" />
        </clusterKey>
    </primaryKey>

</gora-orm>

Here is some sample code to fill the CF:

    // write 1000 readings
    Random rand = new Random();
    for (int i = 0; i < 1000; i++) {
      // sample reading
      SensorReading newReading = new SensorReading();
      newReading.setReading(rand.nextDouble());
      // sample key
      SensorKey key = new SensorKey();
      key.setMeterId(new Utf8("foo"));
      Date date = new Date();
      GregorianCalendar gc = new GregorianCalendar();
      gc.setTime(date);
      key.setDate(date.getTime());
      key.setMonth(gc.get(GregorianCalendar.MONTH));
      key.setYear(gc.get(GregorianCalendar.YEAR));
      key.setWeek(gc.get(GregorianCalendar.WEEK_OF_MONTH));
      // put in store
      dataStore.put(key, newReading);
      dataStore.flush();
    }

And this is how the resulting meterReading CF looks like in cassandra-cli (note 
that it is just one row)

RowKey: s@foo:i@2013
=> (name=i@6:i@3:l@1374252790759:s@reading, value=3fe7a9b7da088b79, 
timestamp=1374252790781000)
=> (name=i@6:i@3:l@1374252790806:s@reading, value=3fefe94fe4fe8239, 
timestamp=1374252790806000)
=> (name=i@6:i@3:l@1374252790807:s@reading, value=3fdc467469b1c6a8, 
timestamp=1374252790807000)
=> (name=i@6:i@3:l@1374252790808:s@reading, value=3fe343b74381da8c, 
timestamp=1374252790808000)
=> (name=i@6:i@3:l@1374252790809:s@reading, value=3fd827df8f477682, 
timestamp=1374252790809000)
=> (name=i@6:i@3:l@1374252790810:s@reading, value=3fe58823ebc94fca, 
timestamp=1374252790811000)
=> (name=i@6:i@3:l@1374252790812:s@reading, value=3fd1745235382594, 
timestamp=1374252790812000)
=> (name=i@6:i@3:l@1374252790813:s@reading, value=3fba982847b15530, 
timestamp=1374252790813000)
=> (name=i@6:i@3:l@1374252790814:s@reading, value=3feb3a6d9c672c62, 
timestamp=1374252790814000)
=> (name=i@6:i@3:l@1374252790815:s@reading, value=3fe6c8c7926df7d1, 
timestamp=1374252790815000)
=> (name=i@6:i@3:l@1374252790816:s@reading, value=3fd5e70878a98e50, 
timestamp=1374252790816000)
...

I'm trying to finish the Gora C* store this month. I'm happy to share the patch 
with you if you are interested.

best regards,
christian

-----Ursprüngliche Nachricht-----
Von: Lewis John Mcgibbney [mailto:[email protected]]
Gesendet: Freitag, 19. Juli 2013 01:42
An: <[email protected]>
Betreff: Re: Gora Cassandra module design rationale

Hi Christian,

On Thu, Jul 18, 2013 at 3:12 PM, <[email protected]> wrote:

> in the course of evaluating Gora, I'm looking for information on the
> design rationale behind the Gora Cassandra module.
>

OK here we go :0)

>
> In particular, I try to find information on the following (debatable?)
> points:
>

I like your choice of words... ;)

>
> - Gora keys are mapped to C* partition keys only
>

Yes this is true. You can see this in CassandraStore#addSubColumns & 
#addSuperColumns where we follow the Cassandra logic CF data is partitioned 
across nodes based on row Key. This would actually be very nice Javadoc for 
such methods (even though they are private), however we should also annotate 
CassandraStore#execute(query) or (partiionQuery) as the same principle applies 
for this method as well.

> - Gora requires the C* ByteOrderedPartitioner
>

Mmmm... Kaz recently changed the Embedded Cassandra server in the tests to use 
ByteOrderedPartitioner as testQueryWebPageQueryEmptyResults (and some other 
tests) were failing
https://issues.apache.org/jira/browse/GORA-157

> - Comparators are always BYTESTYPE
>

> - Gora makes extensive use of super column families
>

As Alfonso said, yes we do. Right now this is the data modelling approach we 
have and maintain. We (Renato and myself) recently discussed @C*Summit that 
this is going to need to change.
When we cross this bridge it will be a (most likely non-backwards
compatable) re-write of the bulk of the C*Module in Gora.
I assume you guys are investing in C* for medium/long term? I can tell you for 
sure that we will be changing Gora... however as Alfonso said we will make best 
efforts to support backwards compat.

> - The implementation has a hard-coded replication factor of 1
>

Right now we use Hector client. Roland and myself have a number of issues open 
to make better use of features provided to us by Hector. Please see below
https://issues.apache.org/jira/browse/GORA-214
https://issues.apache.org/jira/browse/GORA-98
https://issues.apache.org/jira/browse/GORA-209
https://issues.apache.org/jira/browse/GORA-215
https://issues.apache.org/jira/browse/GORA-167

wow I didn't realize we had sop many open!!! Yikes.

>
> - Gora doesn't  seem to take advantage of native column ordering in C*
>

As per Alfonso's quote

- Gora doesn't seem to utilize C* compound primary keys
>
As per Alfonso's quote

- Gora doesn't seem to allow the use of per-operation consistency levels
>
> Please see issue above.

I hope this has answered some of your queries. Apologies for taking a while to 
get back. Been breaking my back @work ;) Thanks Lewis

...

SEEBURGER AG            Vorstand/Seeburger Executive Board:
Sitz der Gesellschaft/Registered Office:                Bernd Seeburger, Axel 
Haas, Michael Kleeberg
Edisonstr. 1
D-75015 Bretten         Vorsitzender des Aufsichtsrats/Chairperson of the 
Seeburger Supervisory Board:
Tel.: 07252 / 96 - 0            Dr. Franz Scherer
Fax: 07252 / 96 - 2222
Internet: http://www.seeburger.de               Registergericht/Commercial 
Register:
e-mail: [email protected]               HRB 240708 Mannheim

Dieses E-Mail ist nur für den Empfänger bestimmt, an den es gerichtet ist und 
kann vertrauliches bzw. unter das Berufsgeheimnis fallendes Material enthalten. 
Jegliche darin enthaltene Ansicht oder Meinungsäußerung ist die des Autors und 
stellt nicht notwendigerweise die Ansicht oder Meinung der SEEBURGER AG dar. 
Sind Sie nicht der Empfänger, so haben Sie diese E-Mail irrtümlich erhalten und 
jegliche Verwendung, Veröffentlichung, Weiterleitung, Abschrift oder jeglicher 
Druck dieser E-Mail ist strengstens untersagt. Weder die SEEBURGER AG noch der 
Absender ( Zirpins. Christian ) übernehmen die Haftung für Viren; es obliegt 
Ihrer Verantwortung, die E-Mail und deren Anhänge auf Viren zu prüfen.

The present email addresses only the addressee which it targets and may contain 
confidential material that may be protected by the professional secret. The 
opinions reflected herein are not necessarily the one of the SEEBURGER AG. If 
you are not the addressee, you have accidentally got this email and are not 
enabled to use, publish, forward, copy or print it in any way. Neither 
SEEBURGER AG , nor the sender (Zirpins. Christian) are liable for viruses, 
being your own responsibility to check this email and its attachments for this 
purpose.

AW: Gora Cassandra module design rationale

Reply via email to