Re: Necessary consistency level for LWT writes

2019-05-23 Thread Craig Pastro
Dear Hiro,

Thank you! Yes, this is exactly my understanding. Is this explicitly
written down anywhere or can any expert confirm? I suppose I should read
the code but its a little intimidating.

Best wishes,
Craig



On Fri, May 24, 2019 at 11:11 AM Hiroyuki Yamada  wrote:

> Hi Craig,
>
> Now I probably understand what the python doc is saying.
>
> As long as `serial_consistency_level` is set to SERIAL for paxos phase
> and `consistency_level` is set to SERIAL for the later read,
> conflicts in paxos table can be properly detected, so
> `consistency_level` for commit phase can be anything (can be ANY as
> the doc says).
> Unfinished record write (commit) will be repaired in the read if any.
> But, if `consistency_level` is set to others (like ALL) for the later
> read, it won't be able to detect conflicts in the paxos table, so it
> does not work as expected.
>
> I'm not sure if it answers your question, but makes sense ?  > Craig.
>
> Is this understanding correct ? > C* professionals.
>
> Thanks,
> Hiro
>
> On Fri, May 24, 2019 at 10:49 AM Craig Pastro  wrote:
> >
> > Dear Hiro,
> >
> > Thank you for your response!
> >
> > Hmm, my understanding is slightly different I think. Please let me try
> to explain one situation and let me know what you think.
> >
> > 1. Do a LWT write with serial_consistency = SERIAL (default) and
> consistency = ONE.
> > 2. LWT starts its Paxos phase and has communicated with a quorum of nodes
> > 3. At this point a read of that data is initiated with consistency =
> SERIAL.
> >
> > Now, here is where I am confused. What I think happens is that a SERIAL
> read will read from a quorum of nodes and detect that the Paxos phase is
> underway and... maybe wait until it is over before responding with the
> latest data? The Paxos phase happens between a quorum so basically even
> though the consistency level is ONE (or indeed ANY as the Python docs
> state), doing a read with SERIAL implies that the write actually took place
> at a consistency level equivalent to QUORUM.
> >
> > Here also what I think is that a read initiated when the Paxos phase is
> underway with a consistency level of QUORUM or ALL will not detect that a
> Paxos phase is underway and return the old current data.
> >
> > Is this correct?
> >
> > Thank you for any help!
> >
> > Best wishes,
> > Craig
> >
> >
> >
> >
> >
> >
> > On Fri, May 24, 2019 at 9:58 AM Hiroyuki Yamada 
> wrote:
> >>
> >> Hi Craig,
> >>
> >> I'm not 100 % sure about some corner cases,
> >> but I'm sure that LWT should be used with the following consistency
> >> levels usually.
> >>
> >> LWT write:
> >> serial_consistency_level: SERIAL
> >> consistency_level: QUORUM
> >>
> >> LWT read:
> >> consistency_level: SERIAL
> >> (It's a bit weird and mis-leading as a design that you can set SERIAL
> >> to consistency_level in read where as you can't for write.)
> >>
> >> BTW, I doubt the python doc is correct in especially the following part.
> >> "But if the regular consistency_level of that write is ANY, then only
> >> a read with a consistency_level of SERIAL is guaranteed to see it
> >> (even a read with consistency ALL is not guaranteed to be enough)."
> >> It is really true?
> >> It doesn't really make sense to me because SERIAL read mostly returns
> >> by seeing quorum of replications,
> >> and write with ANY returns by writing mostly one replication, so they
> >> don't overlap in that case.
> >> It would be great if anyone can clarify this.
> >>
> >> Thanks,
> >> Hiro
> >>
> >>
> >> On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
> >> >
> >> > Hello!
> >> >
> >> > I am trying to understand the consistency level (not serial
> consistency) required for LWTs. Basically what I am trying to understand is
> that if a consistency level of ONE is enough for a LWT write operation if I
> do my read with a consistency level of SERIAL?
> >> >
> >> > It would seem so based on what is written for the datastax python
> driver:
> >> >
> >> >
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
> >> >
> >> > However, that is the only place that I can find this information so I
> am a little hesitant to believe it 100%.
> >> >
> >> > By the way, I did find basically the same question (
> https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but
> I am unsure if the answer there really answers my question.
> >> >
> >> > Thank you in advance for any help!
> >> >
> >> > Best regards,
> >> > Craig
> >> >
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Necessary consistency level for LWT writes

2019-05-23 Thread Hiroyuki Yamada
Hi Craig,

Now I probably understand what the python doc is saying.

As long as `serial_consistency_level` is set to SERIAL for paxos phase
and `consistency_level` is set to SERIAL for the later read,
conflicts in paxos table can be properly detected, so
`consistency_level` for commit phase can be anything (can be ANY as
the doc says).
Unfinished record write (commit) will be repaired in the read if any.
But, if `consistency_level` is set to others (like ALL) for the later
read, it won't be able to detect conflicts in the paxos table, so it
does not work as expected.

I'm not sure if it answers your question, but makes sense ?  > Craig.

Is this understanding correct ? > C* professionals.

Thanks,
Hiro

On Fri, May 24, 2019 at 10:49 AM Craig Pastro  wrote:
>
> Dear Hiro,
>
> Thank you for your response!
>
> Hmm, my understanding is slightly different I think. Please let me try to 
> explain one situation and let me know what you think.
>
> 1. Do a LWT write with serial_consistency = SERIAL (default) and consistency 
> = ONE.
> 2. LWT starts its Paxos phase and has communicated with a quorum of nodes
> 3. At this point a read of that data is initiated with consistency = SERIAL.
>
> Now, here is where I am confused. What I think happens is that a SERIAL read 
> will read from a quorum of nodes and detect that the Paxos phase is underway 
> and... maybe wait until it is over before responding with the latest data? 
> The Paxos phase happens between a quorum so basically even though the 
> consistency level is ONE (or indeed ANY as the Python docs state), doing a 
> read with SERIAL implies that the write actually took place at a consistency 
> level equivalent to QUORUM.
>
> Here also what I think is that a read initiated when the Paxos phase is 
> underway with a consistency level of QUORUM or ALL will not detect that a 
> Paxos phase is underway and return the old current data.
>
> Is this correct?
>
> Thank you for any help!
>
> Best wishes,
> Craig
>
>
>
>
>
>
> On Fri, May 24, 2019 at 9:58 AM Hiroyuki Yamada  wrote:
>>
>> Hi Craig,
>>
>> I'm not 100 % sure about some corner cases,
>> but I'm sure that LWT should be used with the following consistency
>> levels usually.
>>
>> LWT write:
>> serial_consistency_level: SERIAL
>> consistency_level: QUORUM
>>
>> LWT read:
>> consistency_level: SERIAL
>> (It's a bit weird and mis-leading as a design that you can set SERIAL
>> to consistency_level in read where as you can't for write.)
>>
>> BTW, I doubt the python doc is correct in especially the following part.
>> "But if the regular consistency_level of that write is ANY, then only
>> a read with a consistency_level of SERIAL is guaranteed to see it
>> (even a read with consistency ALL is not guaranteed to be enough)."
>> It is really true?
>> It doesn't really make sense to me because SERIAL read mostly returns
>> by seeing quorum of replications,
>> and write with ANY returns by writing mostly one replication, so they
>> don't overlap in that case.
>> It would be great if anyone can clarify this.
>>
>> Thanks,
>> Hiro
>>
>>
>> On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
>> >
>> > Hello!
>> >
>> > I am trying to understand the consistency level (not serial consistency) 
>> > required for LWTs. Basically what I am trying to understand is that if a 
>> > consistency level of ONE is enough for a LWT write operation if I do my 
>> > read with a consistency level of SERIAL?
>> >
>> > It would seem so based on what is written for the datastax python driver:
>> >
>> > http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>> >
>> > However, that is the only place that I can find this information so I am a 
>> > little hesitant to believe it 100%.
>> >
>> > By the way, I did find basically the same question 
>> > (https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but 
>> > I am unsure if the answer there really answers my question.
>> >
>> > Thank you in advance for any help!
>> >
>> > Best regards,
>> > Craig
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Necessary consistency level for LWT writes

2019-05-23 Thread Craig Pastro
Dear Hiro,

Thank you for your response!

Hmm, my understanding is slightly different I think. Please let me try to
explain one situation and let me know what you think.

1. Do a LWT write with serial_consistency = SERIAL (default) and
consistency = ONE.
2. LWT starts its Paxos phase and has communicated with a quorum of nodes
3. At this point a read of that data is initiated with consistency = SERIAL.

Now, here is where I am confused. What I think happens is that a SERIAL
read will read from a quorum of nodes and detect that the Paxos phase is
underway and... maybe wait until it is over before responding with the
latest data? The Paxos phase happens between a quorum so basically even
though the consistency level is ONE (or indeed ANY as the Python docs
state), doing a read with SERIAL implies that the write actually took place
at a consistency level equivalent to QUORUM.

Here also what I think is that a read initiated when the Paxos phase is
underway with a consistency level of QUORUM or ALL will not detect that a
Paxos phase is underway and return the old current data.

Is this correct?

Thank you for any help!

Best wishes,
Craig






On Fri, May 24, 2019 at 9:58 AM Hiroyuki Yamada  wrote:

> Hi Craig,
>
> I'm not 100 % sure about some corner cases,
> but I'm sure that LWT should be used with the following consistency
> levels usually.
>
> LWT write:
> serial_consistency_level: SERIAL
> consistency_level: QUORUM
>
> LWT read:
> consistency_level: SERIAL
> (It's a bit weird and mis-leading as a design that you can set SERIAL
> to consistency_level in read where as you can't for write.)
>
> BTW, I doubt the python doc is correct in especially the following part.
> "But if the regular consistency_level of that write is ANY, then only
> a read with a consistency_level of SERIAL is guaranteed to see it
> (even a read with consistency ALL is not guaranteed to be enough)."
> It is really true?
> It doesn't really make sense to me because SERIAL read mostly returns
> by seeing quorum of replications,
> and write with ANY returns by writing mostly one replication, so they
> don't overlap in that case.
> It would be great if anyone can clarify this.
>
> Thanks,
> Hiro
>
>
> On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
> >
> > Hello!
> >
> > I am trying to understand the consistency level (not serial consistency)
> required for LWTs. Basically what I am trying to understand is that if a
> consistency level of ONE is enough for a LWT write operation if I do my
> read with a consistency level of SERIAL?
> >
> > It would seem so based on what is written for the datastax python driver:
> >
> >
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
> >
> > However, that is the only place that I can find this information so I am
> a little hesitant to believe it 100%.
> >
> > By the way, I did find basically the same question (
> https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but
> I am unsure if the answer there really answers my question.
> >
> > Thank you in advance for any help!
> >
> > Best regards,
> > Craig
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Necessary consistency level for LWT writes

2019-05-23 Thread Hiroyuki Yamada
Hi Craig,

I'm not 100 % sure about some corner cases,
but I'm sure that LWT should be used with the following consistency
levels usually.

LWT write:
serial_consistency_level: SERIAL
consistency_level: QUORUM

LWT read:
consistency_level: SERIAL
(It's a bit weird and mis-leading as a design that you can set SERIAL
to consistency_level in read where as you can't for write.)

BTW, I doubt the python doc is correct in especially the following part.
"But if the regular consistency_level of that write is ANY, then only
a read with a consistency_level of SERIAL is guaranteed to see it
(even a read with consistency ALL is not guaranteed to be enough)."
It is really true?
It doesn't really make sense to me because SERIAL read mostly returns
by seeing quorum of replications,
and write with ANY returns by writing mostly one replication, so they
don't overlap in that case.
It would be great if anyone can clarify this.

Thanks,
Hiro


On Thu, May 23, 2019 at 3:53 PM Craig Pastro  wrote:
>
> Hello!
>
> I am trying to understand the consistency level (not serial consistency) 
> required for LWTs. Basically what I am trying to understand is that if a 
> consistency level of ONE is enough for a LWT write operation if I do my read 
> with a consistency level of SERIAL?
>
> It would seem so based on what is written for the datastax python driver:
>
> http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level
>
> However, that is the only place that I can find this information so I am a 
> little hesitant to believe it 100%.
>
> By the way, I did find basically the same question 
> (https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but I 
> am unsure if the answer there really answers my question.
>
> Thank you in advance for any help!
>
> Best regards,
> Craig
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



schema for testing that has a lot of edge cases

2019-05-23 Thread Carl Mueller
Does anyone have any schema / schema generation that can be used for
general testing that has lots of complicated aspects and data?

For example, it has a bunch of different rk/ck variations, column data
types, altered /added columns and data (which can impact sstables and
compaction),

Mischeivous data to prepopulate (such as
https://github.com/minimaxir/big-list-of-naughty-strings for strings, ugly
keys in maps, semi-evil column names) of sufficient size to get on most
nodes of a 3-5 node cluster

superwide rows
large key values

version specific stuff to 2.1, 2.2, 3.x, 4.x

I'd be happy to centralize this in a github if this doesn't exist anywhere
yet


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-23 Thread Attila Wind

Hi again,

so remaining with a) for a second...
"Why am I using ALLOW FILTERING in the first place?"
Fully agreed! To put it this way: as I reviewer I never want to see 
string occurence "allow filtering" in any selects done by a production 
code. I clearly consider it as an indicator of a wrong db design.
Still! There are use cases - and if I am not mistaken the original 
question was around that - when for whatever reasons PERSONS are running 
such selects manually. E.g. for us where we use Cassandra we have things 
like this:  for analysis purposes. So I think this is a valid use case. 
And once we have found a valid use case question stands. Right? So back 
to the question: "But only in case you do not provide partitioning key 
right?" - I assume the answer is yes right? :-)


b) "I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database"
I get it. Sure. But is Cassandra kind of "single threaded" so much that 
if a node is running one(!) big big extensive query it becomes fully 
unresponsive? I doubt it...
That's what I meant by saying "does not explain or justify". From my 
perspective I definitely consider this kind of being unresponsiveness as 
an abnormal state ...


cheers

Attila


On 23.05.2019 11:42 AM, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-23 Thread shalom sagges
a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an
index on the requested column (preferably queried together with a known
partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers are busy
with the request seriously becoming non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW FILTERING,
you are doing something like a full table scan in a relational database.

There is a lot of information on the internet regarding this subject such
as
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

Regards,

On Thu, May 23, 2019 at 7:33 AM Attila Wind  wrote:

> Hi,
>
> "When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data."
>
> a) Interesting... But only in case you do not provide partitioning key
> right? (so IN() is for partitioning key?)
>
> b) Still does not explain or justify "all 8 nodes to halt and
> unresponsiveness to external requests" behavior... Even if servers are busy
> with the request seriously becoming non-responsive...?
>
> cheers
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +36 31 7811355
>
>
> On 2019. 05. 23. 0:37, shalom sagges wrote:
>
> Hi Vsevolod,
>
> 1) Why such behavior? I thought any given SELECT request is handled by a
> limited subset of C* nodes and not by all of them, as per connection
> consistency/table replication settings, in case.
> When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data.
>
> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
> I'm not familiar with such a flag. In my case, I just try to educate the
> R teams.
>
> Regards,
>
> On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
> wrote:
>
>> Hello everyone,
>>
>> We have an 8 node C* cluster with large volume of unbalanced data. Usual
>> per-partition selects work somewhat fine, and are processed by limited
>> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
>> such command stalls all 8 nodes to halt and unresponsiveness to external
>> requests while disk IO jumps to 100% across whole cluster. In several
>> minutes all nodes seem to finish ptocessing the request and cluster goes
>> back to being responsive. Replication level across whole data is 3.
>>
>> 1) Why such behavior? I thought any given SELECT request is handled by a
>> limited subset of C* nodes and not by all of them, as per connection
>> consistency/table replication settings, in case.
>>
>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>>
>> Thank you all very much in advance,
>> Vsevolod Filaretov.
>>
>


Necessary consistency level for LWT writes

2019-05-23 Thread Craig Pastro
Hello!

I am trying to understand the consistency level (not serial consistency)
required for LWTs. Basically what I am trying to understand is that if a
consistency level of ONE is enough for a LWT write operation if I do my
read with a consistency level of SERIAL?

It would seem so based on what is written for the datastax python driver:

http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level

However, that is the only place that I can find this information so I am a
little hesitant to believe it 100%.

By the way, I did find basically the same question (
https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but I
am unsure if the answer there really answers my question.

Thank you in advance for any help!

Best regards,
Craig