Author: djkevincr Date: Sun Sep 3 10:55:57 2017 New Revision: 1807139 URL: http://svn.apache.org/viewvc?rev=1807139&view=rev Log: GORA-521 add documentaiton for cassandra store
Modified: gora/site/trunk/content/current/gora-cassandra.md Modified: gora/site/trunk/content/current/gora-cassandra.md URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-cassandra.md?rev=1807139&r1=1807138&r2=1807139&view=diff ============================================================================== --- gora/site/trunk/content/current/gora-cassandra.md (original) +++ gora/site/trunk/content/current/gora-cassandra.md Sun Sep 3 10:55:57 2017 @@ -24,16 +24,28 @@ enables [Apache Cassandra](http://cassan <td>Implementation of the persistent Java storage class</td> </tr> <tr> - <td>gora.cassandra.mapping.file=</td> + <td>gora.cassandrastore.mapping.file=</td> <td>/path/to/gora-cassandra-mapping.xml</td> <td>No</td> <td>The XML mapping file to be used. If no value is used this defaults to <code>gora-cassandra-mapping.xml</code></td> </tr> <tr> - <td>gora.cassandra.servers=</td> - <td>localhost:9160</td> + <td>gora.cassandrastore.cassandraServers=</td> + <td>localhost</td> <td>Yes</td> - <td>This value should specify the host:port for a running Cassandra server or node. In this case the server happens to be running on localhost at port 9160 which is the default Cassandra server configuration. It is important that the <b>host</b> matches that specified in <code>gora-cassandra-mapping.xml</code></td> + <td>This value should specify the host for a running Cassandra server or node. In this case the server happens to be running on localhost which is the default Cassandra server configuration.</td> + </tr> + <tr> + <td>gora.cassandrastore.port=</td> + <td>9042</td> + <td>Yes</td> + <td>This value should specify the cql port for a running Cassandra server or node. In this case the server happens to be running on 9042 port which is the default Cassandra server configuration.</td> + </tr> + <tr> + <td>gora.cassandrastore.clusterName=</td> + <td>Test Cluster</td> + <td>No</td> + <td>This value should specify the cassandra cluster name for a running Cassandra server or node. In this case the server has configured to run with Cassandra cluster name as 'Test Cluster' which is the default Cassandra server configuration.</td> </tr> <tr> <td>gora.cassandrastore.username=</td> @@ -47,80 +59,115 @@ enables [Apache Cassandra](http://cassan <td>No</td> <td>The authentication details for passing a <b>password</b> to the CassandraHostConfigurator. This will be required if security is required for Cassandra reads and writes.</td> </tr> - </tboday> + <tr> + <td>gora.cassandrastore.cassandraSerializationType=</td> + <td>AVRO/NATIVE</td> + <td>No</td> + <td>The serialization type for persist into the cassandra data store. default value is Native serialization type</td> + </tr> + </tbody> </table> ##Gora Cassandra mappings -Say we wished to map some Employee data and store it into the CassandraStore. +Say we wished to map some CassandraRecord data and store it into the CassandraStore. + + +<gora-otd> + + <keyspace name="RecordKeySpace" durableWrite="false"> + <placementStrategy name="SimpleStrategy" replication_factor="1"/> + </keyspace> + + <class name="org.apache.gora.cassandra.example.generated.AvroSerialization.CassandraRecord" + keyClass="org.apache.gora.cassandra.example.generated.AvroSerialization.CassandraKey" + keyspace="RecordKeySpace" + table="CassandraRecord" allowFiltering="true" id="5a1c395e-b41f-11e5-9f22-ba0be0483c18"> + <field name="name" column="name" type="text"/> + <field name="dataInt" column="age" type="int"/> + <field name="salary" column="salary" type="bigint"/> + <field name="dataDouble" column="testDouble" type="double"/> + <field name="dataBytes" column="quotes" type="blob"/> + <field name="arrayInt" column="listInt" type="list(int)"/> + <field name="arrayString" column="listString" type="list(text)"/> + <field name="arrayLong" column="listLong" type="list(bigint)"/> + <field name="arrayDouble" column="listDouble" type="list(double)"/> + <field name="mapInt" column="mapInt" type="map(text,int)"/> + <field name="mapString" column="mapString" type="map(text,text)"/> + <field name="mapLong" column="mapLong" type="map(text,bigint)"/> + <field name="mapDouble" column="mapDouble" type="map(text,double)"/> + </class> + + <cassandraKey name="org.apache.gora.cassandra.example.generated.AvroSerialization.CassandraKey"> + <partitionKey> + <field name="url" column="urlData" type="text"/> + <field name="timestamp" column="timestampData" type="bigint"/> + </partitionKey> + <clusterKey> + <key column="timestampData" order="desc"/> + </clusterKey> + </cassandraKey> + +</gora-otd> - <gora-otd> - <keyspace name="Employee" host="localhost" placement_strategy="org.apache.cassandra.locator.SimpleStrategy" - replication_factor="1" cluster="Gora Cassandra Test Cluster"> - <family name="p" gc_grace_seconds="5"/> - <family name="f" gc_grace_seconds="5"/> - <family name="sc" type="super" /> - </keyspace> - - <class name="org.apache.gora.examples.generated.Employee" keyClass="java.lang.String" keyspace="Employee"> - <field name="name" family="p" qualifier="info:nm" ttl="10"/> - <field name="dateOfBirth" family="p" qualifier="info:db" ttl="10"/> - <field name="ssn" family="p" qualifier="info:sn" ttl="10"/> - <field name="salary" family="p" qualifier="info:sl" ttl="10"/> - </class> - </gora-otd> Here you can see that we require the definition of two child elements within the <code>gora-otd</code> mapping configuration, namely; The <b>keyspace</b> element; where we specify: -1. a parameter containing the Cassandra keyspace schema name e.g. <b>Employee</b>, +1. a parameter containing the Cassandra keyspace schema name e.g. <b>RecordKeySpace</b>, + +2. a parameter containing the durable write enabled property in the Cassandra keyspace e.g. <b>false</b>, More about durable write can be found [here](http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_durability_c.html). -2. a parameter containing the host e.g. <b>localhost</b>. The value of the host attribute of keyspace tag should match exactly what is in - gora.properties file. Essentially this means that if you are using port number, you should - use it everywhere regardless of whether it is the default port or not. - At runtime Gora will otherwise try to connect to localhost. For more information please see [here](https://issues.apache.org/jira/browse/GORA-269) - -3. a parameter containing the Cassandra cluster name e.g. <b>Gora Cassandra Test Cluster</b>, - -4. a parameter containing a <b>placement_strategy</b>: The value of 'placement_strategy' should be a fully qualifed class name that is known to - the cassansra cluster, not the application or Gora. As of this writing, the classes that ship - with cassandra are: - <code>org.apache.cassandra.locator.SimpleStrategy</code> and - <code>org.apache.cassandra.locator.NetworkTopologyStrategy</code>. - gora-cassandra will use SimpleStrategy by default if no value for this attribute is specified. Finally - it should be noted that the placement_strategy attribute of the keyspace tag - will only apply if Gora creates the Cassandra Keyspace. More about placement strategies can be found - [here](http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architectureDataDistributeReplication_c.html). +3. the child element <b>placementStrategy</b> containing the Cassandra placementStrategy details, a parameter containing the Cassandra placement strategy name, e.g. <b>SimpleStrategy</b>, gora-cassandra will use SimpleStrategy by default if no value for this attribute is specified. More about placement strategies can be found [here](http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html). -5. a parameter containing a <b>replication_factor</b> attribute with value integer. Again the replacation_factor value associated with the Keyspace tag +4. a parameter containing a <b>replicationFactor</b> attribute with value integer. Again the replicationFactor value associated with the Keyspace tag will only apply if Gora creates the Keyspace and will have no effect if this is being used against - an existing keyspace. the default value for 'replication_factor' is '1'. <b>N.B.</b>In Cassandra this property is required if the placement_strategy - class is SimpleStrategy; otherwise, not used. This value essentially relates to the number of replicas of data you want to reside on multiple nodes. + an existing keyspace. the default value for 'replicationFactor' is '1'. -6. A child element <b>family</b> containing the <b>name</b>, <b>type</b> and <b>gc_grace_seconds</b> parameters for column families we wish to create within Cassandra. In this case we create three columns; <b>p</b>, <b>f</b> and <b>sc</b> the last of which contains an optional <b>type</b> attribute which further defines this as a super column. - Additonally, column families <b>p</b> and <b>f</b> assign a value of 5 to <b>gc_grace_seconds</b>. In Gora we define the default value of 'gc_grace_seconds' as '0' which is ONLY VIABLE FOR A SINGLE NODE - CLUSTER. You should update this value according to your [cluster configuration](https://wiki.apache.org/cassandra/StorageConfiguration). - Columns marked with a gc_grace_seconds exist for a configured time period. More information can be found [here](http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_about_deletes_c.html) The <b>class</b> element specifying persistent fields which values should map to. This element contains; -1. a parameter containing the Persistent class name e.g. <b>org.apache.gora.examples.generated.Employee</b>, +1. a parameter containing the Persistent class name e.g. <b>org.apache.gora.cassandra.example.generated.AvroSerialization.CassandraRecord</b>, + +2. a parameter containing the keyClass e.g. <b>org.apache.gora.cassandra.example.generated.AvroSerialization.CassandraKey</b> which specifies the keys which map to the field values, + +3. a parameter containing the keyspace e.g. <b>RecordKeySpace</b> which matches to the above keyspace definition, + +4. a parameter containing the table name e.g. <b>CassandraRecord</b>, + +5. a parameter containing the allow filtering e.g. <b>true</b>, More about allow filtering can be found [here](https://www.datastax.com/dev/blog/allow-filtering-explained-2). + +6. a child element(s) <b>field</b> which represent fields which are to be persisted into Cassandra. These need to be configured such that they receive the following; + + finally a parameter <b>name</b> e.g. (name, dateOfBirth, ssn and salary respectively) which map to Gora field name, + + a parameter containing the column name e.g. <b>name</b>, + + a parameter containing the data type of the column e.g. <b>text</b>, + + an optional parameter <b>primarykey</b>, which indicates the primary key. + + +The <b>cassandraKey</b> element specifying composite key fields which is used in keyClass, This is optional, this element should be added only when composite keys are available; + +1. a child element(s) <b>partitionKey</b> which represent cassandra partition key + + a child element(s) <b>compositeKey</b> which represent cassandra composite partition key + + a child element(s) <b>field</b> which represent partition key fields which are to be persisted into Cassandra. These need to be configured such that they receive the following; + + a parameter <b>name</b> e.g. (name, dateOfBirth, ssn and salary respectively) which map to Gora field name, -2. a parameter containing the keyClass e.g. <b>java.lang.String</b> which specifies the keys which map to the field values, + a parameter containing the column name e.g. <b>name</b>, -3. a parameter containing the keyspace e.g. <b>Employee</b> which matches to the above keyspace definition, + a parameter containing the data type of the column <b>text</b>, -4. finally a child element(s) <b>field</b> which represent fields which are to be persisted into Cassandra. These need to be configured such that they receive the following; +2. a child element(s) <b>clusterKey</b> which represent cassandra cluster key - a parameter <b>name</b> e.g. (name, dateOfBirth, ssn and salary respectively), + a child element(s) <b>key</b> which represents column key fields which needs to be add clustered key. - a parameter containing the column <b>family</b> to which the field belongs e.g. (all p in this case), + a parameter containing the column name e.g. <b>name</b>, - an optional parameter <b>qualifier</b>, which enables more granular control over the data to be persisted into Cassandra. + a parameter containing the Order type of the column to be applied e.g. <b>desc</b>, - an optional patameter <b>ttl</b> (time to live): the value of the 'ttl' attribute should most likely always - be zero unless you want Cassandra to create Tombstones and delete portions of your - data once this period expires. Any positive value is read and bound to the number - of seconds after which the value for that field will disappear. The default value of ttl - is '0'. \ No newline at end of file