Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-15 Thread Mickael Delanoë
Yes, we try to rely on conditional batches when possible but in this case
it could not be used :
We did some tests with the conditional batches and they could not be
applied when several tables are involved in the batch, even if the tables
use the same partition key : we had the following error "batch with
conditions cannot span multiple tables".
So it could not be applied in our case.
Moreover we would like "isolation" to ensure all data are available on any
table (not only part of them) when a read occurs while the batch is
applied, which is not achievable with conditional batches.

Mickaël




Le 15 déc. 2017 07:12, "Jeff Jirsa"  a écrit :

Again, a lot of potential problems can be solved with data modeling - in
particular consider things like conditional batches where the condition is
on a static cell/column and writes go to different CQL rows.

-- 
Jeff Jirsa


On Dec 14, 2017, at 9:57 PM, Mickael Delanoë  wrote:

Thanks Jeff,
I am a little disappointed when you said the guarantee are even weeker.But
I will take a look on this and try to understand what is really done.



Le 13 déc. 2017 18:18, "Jeff Jirsa"  a écrit :

Entry point is here: https://github.com/apache/cassandra/blob/trunk/src/jav
a/org/apache/cassandra/cql3/statements/BatchStatement.java#L346 , which
will call through to https://github.com/apache/c
assandra/blob/trunk/src/java/org/apache/cassandra/service/St
orageProxy.java#L938-L953

I believe the guarantees are weaker than the blog suggests, but it's
nuanced, and a lot of these types of questions come down to data model (you
can model it in a way that you can avoid problems with weaknesses in
isolation, but that requires a detailed explanation of your use case, etc).




On Wed, Dec 13, 2017 at 8:56 AM, Mickael Delanoë 
wrote:

> Hi Nicolas,
> Thanks for you answer.
> Is your assumption 100% sure ?
> Because the few test I did - using nodetools getendpoints - shown that the
> data for the two tables when I used the same partition key went to the same
> "nodes" . So I would have expected cassandra to be smart enough to apply
> them in the memtable in a single operation to achieve the isolation as the
> whole batch will be executed on a single node.
> Does anybody know where I can find, where the batch operations are
> processed in the Cassandra source code, so I could check how all this is
> processed ?
>
> Regards,
> Mickaël
>
>
>
> 2017-12-13 11:18 GMT+01:00 Nicolas Guyomar :
>
>> Hi Mickael,
>>
>> Partition are related to the table they exist in, so in your case, you
>> are targeting 2 partitions in 2 different tables.
>> Therefore, IMHO, you will only get atomicity using your batch statement
>>
>> On 11 December 2017 at 15:59, Mickael Delanoë 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a question regarding batch isolation and atomicity with query
>>> using a same partition key.
>>>
>>> The Datastax documentation says about the batches :
>>> "Combines multiple DML statements to achieve atomicity and isolation
>>> when targeting a single partition or only atomicity when targeting multiple
>>> partitions. A batch applies all DMLs within a single partition before the
>>> data is available, ensuring atomicity and isolation.""
>>>
>>> But I try to find exactly what can be considered as a "single partition"
>>> and I cannot find a clear response yet. The examples and explanations
>>> always speak about partition with only one table used inside the batch. My
>>> concern is about partition when we use different table in a batch. So I
>>> would like some clarification.
>>>
>>> Here is my use case, I have 2 tables with the same partition-key which
>>> is "user_id" :
>>>
>>> CREATE TABLE tableA (
>>>user_id text,
>>>clustering text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering));
>>>
>>> CREATE TABLE tableB (
>>>user_id text,
>>>clustering1 text,
>>>clustering2 text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering1, clustering2));
>>>
>>> If I do a batch query like this :
>>>
>>> BEGIN BATCH
>>> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
>>> 'val1');
>>> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
>>> ('1234', 'cl1', 'cl2', 'avalue');
>>> APPLY BATCH;
>>>
>>> the DML statements uses the same partition-key, can we say they are
>>> targetting the same partition or, as the partition key are for different
>>> table, should we consider this is different partition? And so does this
>>> batch ensure atomicity and isolation (in the sense described in Datastax
>>> doc)? Or only atomicity?
>>>
>>> Thanks for you help,
>>> Mickaël Delanoë
>>>
>>
>>
>
>
> --
> Mickaël Delanoë
>


Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-14 Thread Jeff Jirsa
Again, a lot of potential problems can be solved with data modeling - in 
particular consider things like conditional batches where the condition is on a 
static cell/column and writes go to different CQL rows. 

-- 
Jeff Jirsa


> On Dec 14, 2017, at 9:57 PM, Mickael Delanoë  wrote:
> 
> Thanks Jeff, 
> I am a little disappointed when you said the guarantee are even weeker.But I 
> will take a look on this and try to understand what is really done.
> 
> 
> 
> Le 13 déc. 2017 18:18, "Jeff Jirsa"  a écrit :
> Entry point is here: 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java#L346
>  , which will call through to 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L938-L953
> 
> I believe the guarantees are weaker than the blog suggests, but it's nuanced, 
> and a lot of these types of questions come down to data model (you can model 
> it in a way that you can avoid problems with weaknesses in isolation, but 
> that requires a detailed explanation of your use case, etc).
> 
> 
> 
> 
>> On Wed, Dec 13, 2017 at 8:56 AM, Mickael Delanoë  
>> wrote:
>> Hi Nicolas, 
>> Thanks for you answer. 
>> Is your assumption 100% sure ?
>> Because the few test I did - using nodetools getendpoints - shown that the 
>> data for the two tables when I used the same partition key went to the same 
>> "nodes" . So I would have expected cassandra to be smart enough to apply 
>> them in the memtable in a single operation to achieve the isolation as the 
>> whole batch will be executed on a single node.
>> Does anybody know where I can find, where the batch operations are processed 
>> in the Cassandra source code, so I could check how all this is processed ?
>> 
>> Regards,
>> Mickaël
>> 
>> 
>> 
>> 2017-12-13 11:18 GMT+01:00 Nicolas Guyomar :
>>> Hi Mickael,
>>> 
>>> Partition are related to the table they exist in, so in your case, you are 
>>> targeting 2 partitions in 2 different tables.
>>> Therefore, IMHO, you will only get atomicity using your batch statement
>>> 
 On 11 December 2017 at 15:59, Mickael Delanoë  wrote:
 Hello,
 
 I have a question regarding batch isolation and atomicity with query using 
 a same partition key.
 
 The Datastax documentation says about the batches :
 "Combines multiple DML statements to achieve atomicity and isolation when 
 targeting a single partition or only atomicity when targeting multiple 
 partitions. A batch applies all DMLs within a single partition before the 
 data is available, ensuring atomicity and isolation.""
 
 But I try to find exactly what can be considered as a "single partition" 
 and I cannot find a clear response yet. The examples and explanations 
 always speak about partition with only one table used inside the batch. My 
 concern is about partition when we use different table in a batch. So I 
 would like some clarification.
 
 Here is my use case, I have 2 tables with the same partition-key which is 
 "user_id" :
 
 CREATE TABLE tableA (
user_id text, 
clustering text, 
value text, 
PRIMARY KEY (user_id, clustering));
 
 CREATE TABLE tableB (
user_id text, 
clustering1 text, 
clustering2 text, 
value text, 
PRIMARY KEY (user_id, clustering1, clustering2));
 
 If I do a batch query like this : 
 
 BEGIN BATCH 
 INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1', 
 'val1');
 INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES 
 ('1234', 'cl1', 'cl2', 'avalue');
 APPLY BATCH;
 
 the DML statements uses the same partition-key, can we say they are 
 targetting the same partition or, as the partition key are for different 
 table, should we consider this is different partition? And so does this 
 batch ensure atomicity and isolation (in the sense described in Datastax 
 doc)? Or only atomicity?
 
 Thanks for you help, 
 Mickaël Delanoë
>>> 
>> 
>> 
>> 
>> -- 
>> Mickaël Delanoë
> 
> 


Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-14 Thread Mickael Delanoë
Thanks Jeff,
I am a little disappointed when you said the guarantee are even weeker.But
I will take a look on this and try to understand what is really done.



Le 13 déc. 2017 18:18, "Jeff Jirsa"  a écrit :

Entry point is here: https://github.com/apache/cassandra/blob/trunk/
src/java/org/apache/cassandra/cql3/statements/BatchStatement.java#L346 ,
which will call through to https://github.com/apache/
cassandra/blob/trunk/src/java/org/apache/cassandra/service/
StorageProxy.java#L938-L953

I believe the guarantees are weaker than the blog suggests, but it's
nuanced, and a lot of these types of questions come down to data model (you
can model it in a way that you can avoid problems with weaknesses in
isolation, but that requires a detailed explanation of your use case, etc).




On Wed, Dec 13, 2017 at 8:56 AM, Mickael Delanoë 
wrote:

> Hi Nicolas,
> Thanks for you answer.
> Is your assumption 100% sure ?
> Because the few test I did - using nodetools getendpoints - shown that the
> data for the two tables when I used the same partition key went to the same
> "nodes" . So I would have expected cassandra to be smart enough to apply
> them in the memtable in a single operation to achieve the isolation as the
> whole batch will be executed on a single node.
> Does anybody know where I can find, where the batch operations are
> processed in the Cassandra source code, so I could check how all this is
> processed ?
>
> Regards,
> Mickaël
>
>
>
> 2017-12-13 11:18 GMT+01:00 Nicolas Guyomar :
>
>> Hi Mickael,
>>
>> Partition are related to the table they exist in, so in your case, you
>> are targeting 2 partitions in 2 different tables.
>> Therefore, IMHO, you will only get atomicity using your batch statement
>>
>> On 11 December 2017 at 15:59, Mickael Delanoë 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a question regarding batch isolation and atomicity with query
>>> using a same partition key.
>>>
>>> The Datastax documentation says about the batches :
>>> "Combines multiple DML statements to achieve atomicity and isolation
>>> when targeting a single partition or only atomicity when targeting multiple
>>> partitions. A batch applies all DMLs within a single partition before the
>>> data is available, ensuring atomicity and isolation.""
>>>
>>> But I try to find exactly what can be considered as a "single partition"
>>> and I cannot find a clear response yet. The examples and explanations
>>> always speak about partition with only one table used inside the batch. My
>>> concern is about partition when we use different table in a batch. So I
>>> would like some clarification.
>>>
>>> Here is my use case, I have 2 tables with the same partition-key which
>>> is "user_id" :
>>>
>>> CREATE TABLE tableA (
>>>user_id text,
>>>clustering text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering));
>>>
>>> CREATE TABLE tableB (
>>>user_id text,
>>>clustering1 text,
>>>clustering2 text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering1, clustering2));
>>>
>>> If I do a batch query like this :
>>>
>>> BEGIN BATCH
>>> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
>>> 'val1');
>>> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
>>> ('1234', 'cl1', 'cl2', 'avalue');
>>> APPLY BATCH;
>>>
>>> the DML statements uses the same partition-key, can we say they are
>>> targetting the same partition or, as the partition key are for different
>>> table, should we consider this is different partition? And so does this
>>> batch ensure atomicity and isolation (in the sense described in Datastax
>>> doc)? Or only atomicity?
>>>
>>> Thanks for you help,
>>> Mickaël Delanoë
>>>
>>
>>
>
>
> --
> Mickaël Delanoë
>


Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-13 Thread Jeff Jirsa
Entry point is here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java#L346
, which will call through to
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageProxy.java#L938-L953

I believe the guarantees are weaker than the blog suggests, but it's
nuanced, and a lot of these types of questions come down to data model (you
can model it in a way that you can avoid problems with weaknesses in
isolation, but that requires a detailed explanation of your use case, etc).




On Wed, Dec 13, 2017 at 8:56 AM, Mickael Delanoë 
wrote:

> Hi Nicolas,
> Thanks for you answer.
> Is your assumption 100% sure ?
> Because the few test I did - using nodetools getendpoints - shown that the
> data for the two tables when I used the same partition key went to the same
> "nodes" . So I would have expected cassandra to be smart enough to apply
> them in the memtable in a single operation to achieve the isolation as the
> whole batch will be executed on a single node.
> Does anybody know where I can find, where the batch operations are
> processed in the Cassandra source code, so I could check how all this is
> processed ?
>
> Regards,
> Mickaël
>
>
>
> 2017-12-13 11:18 GMT+01:00 Nicolas Guyomar :
>
>> Hi Mickael,
>>
>> Partition are related to the table they exist in, so in your case, you
>> are targeting 2 partitions in 2 different tables.
>> Therefore, IMHO, you will only get atomicity using your batch statement
>>
>> On 11 December 2017 at 15:59, Mickael Delanoë 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a question regarding batch isolation and atomicity with query
>>> using a same partition key.
>>>
>>> The Datastax documentation says about the batches :
>>> "Combines multiple DML statements to achieve atomicity and isolation
>>> when targeting a single partition or only atomicity when targeting multiple
>>> partitions. A batch applies all DMLs within a single partition before the
>>> data is available, ensuring atomicity and isolation.""
>>>
>>> But I try to find exactly what can be considered as a "single partition"
>>> and I cannot find a clear response yet. The examples and explanations
>>> always speak about partition with only one table used inside the batch. My
>>> concern is about partition when we use different table in a batch. So I
>>> would like some clarification.
>>>
>>> Here is my use case, I have 2 tables with the same partition-key which
>>> is "user_id" :
>>>
>>> CREATE TABLE tableA (
>>>user_id text,
>>>clustering text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering));
>>>
>>> CREATE TABLE tableB (
>>>user_id text,
>>>clustering1 text,
>>>clustering2 text,
>>>value text,
>>>PRIMARY KEY (user_id, clustering1, clustering2));
>>>
>>> If I do a batch query like this :
>>>
>>> BEGIN BATCH
>>> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
>>> 'val1');
>>> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
>>> ('1234', 'cl1', 'cl2', 'avalue');
>>> APPLY BATCH;
>>>
>>> the DML statements uses the same partition-key, can we say they are
>>> targetting the same partition or, as the partition key are for different
>>> table, should we consider this is different partition? And so does this
>>> batch ensure atomicity and isolation (in the sense described in Datastax
>>> doc)? Or only atomicity?
>>>
>>> Thanks for you help,
>>> Mickaël Delanoë
>>>
>>
>>
>
>
> --
> Mickaël Delanoë
>


Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-13 Thread Mickael Delanoë
Hi Nicolas,
Thanks for you answer.
Is your assumption 100% sure ?
Because the few test I did - using nodetools getendpoints - shown that the
data for the two tables when I used the same partition key went to the same
"nodes" . So I would have expected cassandra to be smart enough to apply
them in the memtable in a single operation to achieve the isolation as the
whole batch will be executed on a single node.
Does anybody know where I can find, where the batch operations are
processed in the Cassandra source code, so I could check how all this is
processed ?

Regards,
Mickaël



2017-12-13 11:18 GMT+01:00 Nicolas Guyomar :

> Hi Mickael,
>
> Partition are related to the table they exist in, so in your case, you are
> targeting 2 partitions in 2 different tables.
> Therefore, IMHO, you will only get atomicity using your batch statement
>
> On 11 December 2017 at 15:59, Mickael Delanoë 
> wrote:
>
>> Hello,
>>
>> I have a question regarding batch isolation and atomicity with query
>> using a same partition key.
>>
>> The Datastax documentation says about the batches :
>> "Combines multiple DML statements to achieve atomicity and isolation when
>> targeting a single partition or only atomicity when targeting multiple
>> partitions. A batch applies all DMLs within a single partition before the
>> data is available, ensuring atomicity and isolation.""
>>
>> But I try to find exactly what can be considered as a "single partition"
>> and I cannot find a clear response yet. The examples and explanations
>> always speak about partition with only one table used inside the batch. My
>> concern is about partition when we use different table in a batch. So I
>> would like some clarification.
>>
>> Here is my use case, I have 2 tables with the same partition-key which is
>> "user_id" :
>>
>> CREATE TABLE tableA (
>>user_id text,
>>clustering text,
>>value text,
>>PRIMARY KEY (user_id, clustering));
>>
>> CREATE TABLE tableB (
>>user_id text,
>>clustering1 text,
>>clustering2 text,
>>value text,
>>PRIMARY KEY (user_id, clustering1, clustering2));
>>
>> If I do a batch query like this :
>>
>> BEGIN BATCH
>> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
>> 'val1');
>> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
>> ('1234', 'cl1', 'cl2', 'avalue');
>> APPLY BATCH;
>>
>> the DML statements uses the same partition-key, can we say they are
>> targetting the same partition or, as the partition key are for different
>> table, should we consider this is different partition? And so does this
>> batch ensure atomicity and isolation (in the sense described in Datastax
>> doc)? Or only atomicity?
>>
>> Thanks for you help,
>> Mickaël Delanoë
>>
>
>


-- 
Mickaël Delanoë


Re: Batch : Isolation and Atomicity for same partition on multiple table

2017-12-13 Thread Nicolas Guyomar
Hi Mickael,

Partition are related to the table they exist in, so in your case, you are
targeting 2 partitions in 2 different tables.
Therefore, IMHO, you will only get atomicity using your batch statement

On 11 December 2017 at 15:59, Mickael Delanoë  wrote:

> Hello,
>
> I have a question regarding batch isolation and atomicity with query using
> a same partition key.
>
> The Datastax documentation says about the batches :
> "Combines multiple DML statements to achieve atomicity and isolation when
> targeting a single partition or only atomicity when targeting multiple
> partitions. A batch applies all DMLs within a single partition before the
> data is available, ensuring atomicity and isolation.""
>
> But I try to find exactly what can be considered as a "single partition"
> and I cannot find a clear response yet. The examples and explanations
> always speak about partition with only one table used inside the batch. My
> concern is about partition when we use different table in a batch. So I
> would like some clarification.
>
> Here is my use case, I have 2 tables with the same partition-key which is
> "user_id" :
>
> CREATE TABLE tableA (
>user_id text,
>clustering text,
>value text,
>PRIMARY KEY (user_id, clustering));
>
> CREATE TABLE tableB (
>user_id text,
>clustering1 text,
>clustering2 text,
>value text,
>PRIMARY KEY (user_id, clustering1, clustering2));
>
> If I do a batch query like this :
>
> BEGIN BATCH
> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
> 'val1');
> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
> ('1234', 'cl1', 'cl2', 'avalue');
> APPLY BATCH;
>
> the DML statements uses the same partition-key, can we say they are
> targetting the same partition or, as the partition key are for different
> table, should we consider this is different partition? And so does this
> batch ensure atomicity and isolation (in the sense described in Datastax
> doc)? Or only atomicity?
>
> Thanks for you help,
> Mickaël Delanoë
>


Batch : Isolation and Atomicity for same partition on multiple table

2017-12-11 Thread Mickael Delanoë
Hello,

I have a question regarding batch isolation and atomicity with query using
a same partition key.

The Datastax documentation says about the batches :
"Combines multiple DML statements to achieve atomicity and isolation when
targeting a single partition or only atomicity when targeting multiple
partitions. A batch applies all DMLs within a single partition before the
data is available, ensuring atomicity and isolation.""

But I try to find exactly what can be considered as a "single partition"
and I cannot find a clear response yet. The examples and explanations
always speak about partition with only one table used inside the batch. My
concern is about partition when we use different table in a batch. So I
would like some clarification.

Here is my use case, I have 2 tables with the same partition-key which is
"user_id" :

CREATE TABLE tableA (
   user_id text,
   clustering text,
   value text,
   PRIMARY KEY (user_id, clustering));

CREATE TABLE tableB (
   user_id text,
   clustering1 text,
   clustering2 text,
   value text,
   PRIMARY KEY (user_id, clustering1, clustering2));

If I do a batch query like this :

BEGIN BATCH
INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
'val1');
INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
('1234', 'cl1', 'cl2', 'avalue');
APPLY BATCH;

the DML statements uses the same partition-key, can we say they are
targetting the same partition or, as the partition key are for different
table, should we consider this is different partition? And so does this
batch ensure atomicity and isolation (in the sense described in Datastax
doc)? Or only atomicity?

Thanks for you help,
Mickaël Delanoë