[
https://issues.apache.org/jira/browse/CASSANDRA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis R updated CASSANDRA-10085:
---------------------------------
Reproduced In: 2.2.0, 2.0.7 (was: 2.0.7, 2.2.0)
Description:
h4. Use case:
I want to use batch to keep data between tables synchronized with help of
conditionals. In particular, I want to have a ONE-to-ONE relationship in my
data model. I dont want to have an unintended upsert while trying to maintain a
consistent one-to-one relationship (write requests from other clients that try
to create a different relationship with one of the keys I inteded to use),
which is why I would like the IF NOT EXISTS when inserting data into multiple
tables in a batch.
But when trying to insert data in a batch-block with an IF NOT EXISTS, I get
the following response (reproduceable example further below):
{code}
cqlsh:testkeyspace>
BEGIN BATCH
INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
APPLY BATCH;
Bad Request: Batch with conditions cannot span multiple tables{code}
The tables have a primary key on (partKey, key1) and (partKey, key2). Which
means that partKey in both cases is their partition key. Which in the example
also is the same value.
h4. Why I want to use a BATCH-statement that way:
In traditional databases, to design a one-to-one relationship, I would create
one table with two keys, both with a unique constraint:
{code:sql}
CREATE TABLE `myOneToOneTable` (
`ID` int NOT NULL,
`KEY_ONE` int NOT NULL,
`KEY_TWO` int NOT NULL,
// some other fields
PRIMARY KEY (`ID`),
UNIQUE KEY `KEY_ONE`,
UNIQUE KEY `KEY_TWO`
);
{code}
(simplified example taken from an [one-to-one example for
mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/])
As far as I know, unique constraints are not supported in Cassandra, so another
way around it would be to maintain the relationshipt between two tables, each
sharing the same partition key. For which I would like Cassandra to make sure
that when I insert data, either both are inserted at the same time or none (to
prevent other writing client to insert a different one-to-one relationship than
me).
[According to the documentation, this is what BATCH-Statements are used
for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html]
{quote}
Instead, using batches to synchronize data to tables is a legitimate operation.
{quote}
I expected it to be able to do it, but as seen above, I cannot enforce INSERT
IF NOT EXISTS over multiple tables.
I do understand the background as to *why* there is a limitation coded that
way, so that not too many nodes have to take part in processing of the batch.
However, if the operations use
# primary keys in their statements for each table
# all primary keys contain the same value, thus resulting in the same hash for
partioning
# tables belong to the same keyspace
the batch statement should be able to be limited to the same amount of nodes as
if the batch contained INSERT IF NOT EXISTs for only one table.
h4. Steps to reproduce desired behaviour:
I used the [cassandra ova on virtual
box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/], but
also a Windows Setup with 2.2.0
{code}
Connected to Cluster on a Stick at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE conditionalBatch
WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1};
cqlsh> use conditionalbatch ;
cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 (
partKey int,
key1 int,
key2 int,
PRIMARY KEY (partKey, key1)
);
cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int, key1
int, PRIMARY KEY (partKey, key2)
);
cqlsh:conditionalbatch> BEGIN BATCH
INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS;
INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS;
APPLY BATCH;
{code}
h4. What should happen:
Either succeding, or:
{code}
applied | partkey | key1 | key2|
-----------+---------+------+------
false | 1 | 2 | 3
false | 1 | 3 | 2
{code}
h4. What happens instead:
{code}
Bad Request: Batch with conditions cannot span multiple tables
{code}
In my opinion, this is a very crucial feature wich enables the users to
maintain unique constraints with a relatively small amount of work. There
certainly are other ways to maintain unique constrains, but I have not found a
trivial way within cassandra that lets me that easily. But maybe I have
overlooked something though.
was:
h4. Use case:
I want to use batch to keep data between tables synchronized with help of
conditionals. In particular, I want to have a ONE-to-ONE relationship in my
data model. I dont want to have an unintended upsert while trying to maintain a
consistent one-to-one relationship (write requests from other clients that try
to create a different relationship with one of the keys I inteded to use),
which is why I would like the IF NOT EXISTS when inserting data into multiple
tables in a batch.
But when trying to insert data in a batch-block with an IF NOT EXISTS, I get
the following response (reproduceable example further below):
{code}
cqlsh:testkeyspace>
BEGIN BATCH
INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
APPLY BATCH;
Bad Request: Batch with conditions cannot span multiple tables
{code}
The tables have a primary key on (partKey, key1) and (partKey, key2). Which
means that partKey in both cases is their partition key. Which in the example
also is the same value.
h4. Why I want to use a BATCH-statement that way:
In traditional databases, to design a one-to-one relationship, I would create
one table with two keys, both with a unique constraint:
{code:sql}
CREATE TABLE `myOneToOneTable` (
`ID` int NOT NULL,
`KEY_ONE` int NOT NULL,
`KEY_TWO` int NOT NULL,
// some other fields
PRIMARY KEY (`ID`),
UNIQUE KEY `KEY_ONE`,
UNIQUE KEY `KEY_TWO`
);
{code}
(simplified example taken from an [one-to-one example for
mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/])
As far as I know, unique constraints are not supported in Cassandra, so another
way around it would be to maintain the relationshipt between two tables, each
sharing the same partition key. For which I would like Cassandra to make sure
that when I insert data, either both are inserted at the same time or none (to
prevent other writing client to insert a different one-to-one relationship than
me).
[According to the documentation, this is what BATCH-Statements are used
for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html]
{quote}
Instead, using batches to synchronize data to tables is a legitimate operation.
{quote}
I expected it to be able to do it, but as seen above, I cannot enforce INSERT
IF NOT EXISTS over multiple tables.
I do understand the background as to *why* there is a limitation coded that
way, so that not too many nodes have to take part in processing of the batch.
However, if the operations use
# primary keys in their statements for each table
# all primary keys contain the same value, thus resulting in the same hash for
partioning
# tables belong to the same keyspace
the batch statement should be able to be limited to the same amount of nodes as
if the batch contained INSERT IF NOT EXISTs for only one table.
h4. Steps to reproduce desired behaviour:
I used the [cassandra ova on virtual
box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/], but
also a Windows Setup with 2.2.0
{code}
Connected to Cluster on a Stick at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE conditionalBatch
WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1};
cqlsh> use conditionalbatch ;
cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 (
partKey int,
key1 int,
key2 int,
PRIMARY KEY (partKey, key1)
);
cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int, key1
int, PRIMARY KEY (partKey, key2)
);
cqlsh:conditionalbatch> BEGIN BATCH
INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS;
INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS;
APPLY BATCH;
{code}
h4. What should happen:
Either succeding, or:
{code}
applied | partkey | key1 | key2|
-----------+---------+------+------
false | 1 | 2 | 3
false | 1 | 3 | 2
{code}
h4. What happens instead:
{code}
Bad Request: Batch with conditions cannot span multiple tables
{code}
In my opinion, this is a very crucial feature wich enables the users to
maintain unique constraints with a relatively small amount of work. There
certainly are other ways to maintain unique constrains, but I have not found a
trivial way within cassandra that lets me that easily. But maybe I have
overlooked something though.
> Allow BATCH with conditions to span multiple tables with same partition key
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-10085
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10085
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Dennis R
> Labels: batch, conditional-statement, partitioning
>
> h4. Use case:
> I want to use batch to keep data between tables synchronized with help of
> conditionals. In particular, I want to have a ONE-to-ONE relationship in my
> data model. I dont want to have an unintended upsert while trying to maintain
> a consistent one-to-one relationship (write requests from other clients that
> try to create a different relationship with one of the keys I inteded to
> use), which is why I would like the IF NOT EXISTS when inserting data into
> multiple tables in a batch.
> But when trying to insert data in a batch-block with an IF NOT EXISTS, I get
> the following response (reproduceable example further below):
> {code}
> cqlsh:testkeyspace>
> BEGIN BATCH
> INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
> INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS;
> APPLY BATCH;
> Bad Request: Batch with conditions cannot span multiple tables{code}
> The tables have a primary key on (partKey, key1) and (partKey, key2). Which
> means that partKey in both cases is their partition key. Which in the example
> also is the same value.
> h4. Why I want to use a BATCH-statement that way:
> In traditional databases, to design a one-to-one relationship, I would create
> one table with two keys, both with a unique constraint:
> {code:sql}
> CREATE TABLE `myOneToOneTable` (
> `ID` int NOT NULL,
> `KEY_ONE` int NOT NULL,
> `KEY_TWO` int NOT NULL,
> // some other fields
> PRIMARY KEY (`ID`),
> UNIQUE KEY `KEY_ONE`,
> UNIQUE KEY `KEY_TWO`
> );
> {code}
> (simplified example taken from an [one-to-one example for
> mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/])
> As far as I know, unique constraints are not supported in Cassandra, so
> another way around it would be to maintain the relationshipt between two
> tables, each sharing the same partition key. For which I would like Cassandra
> to make sure that when I insert data, either both are inserted at the same
> time or none (to prevent other writing client to insert a different
> one-to-one relationship than me).
> [According to the documentation, this is what BATCH-Statements are used
> for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html]
> {quote}
> Instead, using batches to synchronize data to tables is a legitimate
> operation.
> {quote}
> I expected it to be able to do it, but as seen above, I cannot enforce INSERT
> IF NOT EXISTS over multiple tables.
> I do understand the background as to *why* there is a limitation coded that
> way, so that not too many nodes have to take part in processing of the batch.
> However, if the operations use
> # primary keys in their statements for each table
> # all primary keys contain the same value, thus resulting in the same hash
> for partioning
> # tables belong to the same keyspace
> the batch statement should be able to be limited to the same amount of nodes
> as if the batch contained INSERT IF NOT EXISTs for only one table.
> h4. Steps to reproduce desired behaviour:
> I used the [cassandra ova on virtual
> box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/], but
> also a Windows Setup with 2.2.0
> {code}
> Connected to Cluster on a Stick at localhost:9160.
>
> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
>
> Use HELP for help.
>
> cqlsh> CREATE KEYSPACE conditionalBatch
>
> WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1};
> cqlsh> use conditionalbatch ;
>
> cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 (
>
> partKey int,
> key1 int,
> key2 int,
> PRIMARY KEY (partKey, key1)
> );
> cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int,
> key1
> int, PRIMARY KEY (partKey, key2)
> );
> cqlsh:conditionalbatch> BEGIN BATCH
>
> INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS;
>
> INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS;
>
> APPLY BATCH;
> {code}
> h4. What should happen:
> Either succeding, or:
> {code}
> applied | partkey | key1 | key2|
>
> -----------+---------+------+------
>
> false | 1 | 2 | 3
>
> false | 1 | 3 | 2
> {code}
> h4. What happens instead:
> {code}
> Bad Request: Batch with conditions cannot span multiple tables
> {code}
> In my opinion, this is a very crucial feature wich enables the users to
> maintain unique constraints with a relatively small amount of work. There
> certainly are other ways to maintain unique constrains, but I have not found
> a trivial way within cassandra that lets me that easily. But maybe I have
> overlooked something though.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)