Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Tyler for clarification. I'll opened a tix CASSANDRA-7982
. For now, I've
assigned to myself and put you as a reviewer. Pls. change assignment as you
prefer..

Assume that we now batch the requests & send only one request to the
replica:

What's the extra overhead incurred by vnode to process the secondary index
request on the replica? In other words, does replica still has to fire
individual queries internally for all the token ranges
[(max(-9193352069377957523),
max(-9136021049555745100), etc.], or it can be optimized to be done in one
shot? If multiple queries, then how much overhead it adds? (in terms of
latency because of multiple disk lookups, etc.?)

Would you mind to point me C* code location (class/method) to explore more?

Also, can you help understand what it means by min() and max() in the trace
output?
[min(-9223372036854775808), max(-9193352069377957523)] vs.
(max(-8959555493872108621),
max(-8929774302283364912)]

Jay



On Fri, Sep 19, 2014 at 3:28 PM, Tyler Hobbs  wrote:

>
> On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel  wrote:
>
>>
>> When coordinator fires indexed scan request to node 192.168.51.22, why
>> don't it ask that node to check all of its (at least primary) ranges for
>> the queried data, at once. Also, internally that node should be able to
>> just do one scan through all of the ranges held by it, isn't it?
>> (e.g. [min(-9223372036854775808), max(-9193352069377957523), and
>> (max(-9136021049555745100), max(-8959555493872108621)], etc. ]
>>
>> Seems like it needs to query data in token order. So,
>> min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
>> But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
>> is on 192.168.51.25 so fire query there. Then, next range  (max(-
>> *9136021049555745100*), max(-8959555493872108621)] again on
>> 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace
>> output.
>>
>
> The coordinator certainly could batch multiple range requests that are
> going to the same replica.  It's an optimization that would primarily help
> the empty table/high cardinality case, but you're welcome to open a
> ticket.  3.0 is the earliest this would make it in.
>
>
>>
>> I found below comment in
>> https://issues.apache.org/jira/browse/CASSANDRA-4858.
>> "The problem is that we have to scan the nodes in token order so we dont
>> break the existing API's, if we do so then we are sending a lot more
>> requests and waiting for the response than the number of nodes. "
>> Don't understand the restriction though - "don't break the existing
>> API's".
>>
>
> I think he's just saying that we have to make sure we return results in
> token order (and if there's a limit on the query, return the first N
> results when listed in token order).
>
>
>>
>> With non-vnode, it only queries a particular node only one time..Btw, in
>> the worst case, I understand secondary index query has to scan all the
>> nodes in cluster sometime (empty table or high cardinality index?) but I
>> don't understand why vnode makes it to scan the *same node *multiple
>> times. If RF is 1, then also I see this behavior.
>>
>> >> Snippet from output1.txt attached earlier:
>> Executing indexed scan for [min(-9223372036854775808),
>> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
>> Executing indexed scan for (max(-9193352069377957523),
>> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
>> Executing indexed scan for (max(-9136021049555745100),
>> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
>> Executing indexed scan for (max(-8959555493872108621),
>> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
>>
>
> I'm not sure how your question here is different from the one above.
>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel  wrote:

>
> When coordinator fires indexed scan request to node 192.168.51.22, why
> don't it ask that node to check all of its (at least primary) ranges for
> the queried data, at once. Also, internally that node should be able to
> just do one scan through all of the ranges held by it, isn't it?
> (e.g. [min(-9223372036854775808), max(-9193352069377957523), and
> (max(-9136021049555745100), max(-8959555493872108621)], etc. ]
>
> Seems like it needs to query data in token order. So,
> min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
> But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
> is on 192.168.51.25 so fire query there. Then, next range  (max(-
> *9136021049555745100*), max(-8959555493872108621)] again on
> 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace
> output.
>

The coordinator certainly could batch multiple range requests that are
going to the same replica.  It's an optimization that would primarily help
the empty table/high cardinality case, but you're welcome to open a
ticket.  3.0 is the earliest this would make it in.


>
> I found below comment in
> https://issues.apache.org/jira/browse/CASSANDRA-4858.
> "The problem is that we have to scan the nodes in token order so we dont
> break the existing API's, if we do so then we are sending a lot more
> requests and waiting for the response than the number of nodes. "
> Don't understand the restriction though - "don't break the existing API's".
>

I think he's just saying that we have to make sure we return results in
token order (and if there's a limit on the query, return the first N
results when listed in token order).


>
> With non-vnode, it only queries a particular node only one time..Btw, in
> the worst case, I understand secondary index query has to scan all the
> nodes in cluster sometime (empty table or high cardinality index?) but I
> don't understand why vnode makes it to scan the *same node *multiple
> times. If RF is 1, then also I see this behavior.
>
> >> Snippet from output1.txt attached earlier:
> Executing indexed scan for [min(-9223372036854775808),
> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
> Executing indexed scan for (max(-9193352069377957523),
> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
> Executing indexed scan for (max(-9136021049555745100),
> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
> Executing indexed scan for (max(-8959555493872108621),
> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
>

I'm not sure how your question here is different from the one above.




-- 
Tyler Hobbs
DataStax 


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Robert for your intput but that sounds little crazy to me. Still
physical node is the same so why can't it just do one indexed scan for all
the contiguous or non-contiguous token ranges (vnodes) held by that
physical node. I doubt that it needs to respect token order for "some
reason" & hence the multiple scans.

Great if you or someone can help me clarify below doubts (in the context of
trace output):
>>

When coordinator fires indexed scan request to node 192.168.51.22, why
don't it ask that node to check all of its (at least primary) ranges for
the queried data, at once. Also, internally that node should be able to
just do one scan through all of the ranges held by it, isn't it?
(e.g. [min(-9223372036854775808), max(-9193352069377957523), and
(max(-9136021049555745100), max(-8959555493872108621)], and etc. ]

Seems like it needs to query data in token order. So,
min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
is on 192.168.51.25 so fire query there. Then, next range  (max(-
*9136021049555745100*), max(-8959555493872108621)] again on  192.168.51.22.
Btw,, I'm not too sure regarding min/max or max/max in trace output.

I found below comment in
https://issues.apache.org/jira/browse/CASSANDRA-4858.
"The problem is that we have to scan the nodes in token order so we dont
break the existing API's, if we do so then we are sending a lot more
requests and waiting for the response than the number of nodes. "
Don't understand the restriction though - "don't break the existing API's".

With non-vnode, it only queries a particular node only one time..Btw, in
the worst case, I understand secondary index query has to scan all the
nodes in cluster sometime (empty table or high cardinality index?) but I
don't understand why vnode makes it to scan the *same node *multiple times.
If RF is 1, then also I see this behavior.

>> Snippet from output1.txt attached earlier:
Executing indexed scan for [min(-9223372036854775808),
max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
Executing indexed scan for (max(-9193352069377957523),
max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
Executing indexed scan for (max(-9136021049555745100),
max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
Executing indexed scan for (max(-8959555493872108621),
max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |


On Fri, Sep 19, 2014 at 2:54 PM, Robert Coli  wrote:

> On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan  wrote:
>
>>  But does it implies that with vnodes, there are actually "extra work" to
>> do for scanning indices ?
>>
>
> Vnodes are just nodes, so they have all the
> problems-associated-with-many-nodes one would get with 256x as many nodes.
>
> =Rob
>
>


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Robert Coli
On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan  wrote:

>  But does it implies that with vnodes, there are actually "extra work" to
> do for scanning indices ?
>

Vnodes are just nodes, so they have all the
problems-associated-with-many-nodes one would get with 256x as many nodes.

=Rob


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Tyler for the details. I'm still trying to understand what you
described.

Just to simplify my question & what I don't understand:

When coordinator fires indexed scan request to node 192.168.51.22, why
don't it ask that node to check all of its (at least primary) ranges for
the queried data, at once. Also, internally that node should be able to
just do one scan through all of the ranges held by it, isn't it?
(e.g. [min(-9223372036854775808), max(-9193352069377957523), and
(max(-9136021049555745100), max(-8959555493872108621)], etc. ]

Seems like it needs to query data in token order. So,
min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
is on 192.168.51.25 so fire query there. Then, next range  (max(-
*9136021049555745100*), max(-8959555493872108621)] again on  192.168.51.22.
Btw,, I'm not too sure regarding min/max or max/max in trace output.

I found below comment in
https://issues.apache.org/jira/browse/CASSANDRA-4858.
"The problem is that we have to scan the nodes in token order so we dont
break the existing API's, if we do so then we are sending a lot more
requests and waiting for the response than the number of nodes. "
Don't understand the restriction though - "don't break the existing API's".

With non-vnode, it only queries a particular node only one time..Btw, in
the worst case, I understand secondary index query has to scan all the
nodes in cluster sometime (empty table or high cardinality index?) but I
don't understand why vnode makes it to scan the *same node *multiple times.
If RF is 1, then also I see this behavior.

>> Snippet from output1.txt attached earlier:
Executing indexed scan for [min(-9223372036854775808),
max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
Executing indexed scan for (max(-9193352069377957523),
max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
Executing indexed scan for (max(-9136021049555745100),
max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
Executing indexed scan for (max(-8959555493872108621),
max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |

Great if you or someone can describe further.

Thanks!!



On Fri, Sep 19, 2014 at 2:33 PM, Tyler Hobbs  wrote:

>
> On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan  wrote:
>
>>
>>  But does it implies that with vnodes, there are actually "extra work" to
>> do for scanning indices ?
>>
>
> Yes.
>
>
>> If yes, is this "extra load" rather I/O bound or CPU bound ?
>>
>
> It doesn't necessarily change what the query is "bound" by, except perhaps
> in the case where you have almost no matching results.  There are more
> messages to dispatch and handle.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan  wrote:

>
>  But does it implies that with vnodes, there are actually "extra work" to
> do for scanning indices ?
>

Yes.


> If yes, is this "extra load" rather I/O bound or CPU bound ?
>

It doesn't necessarily change what the query is "bound" by, except perhaps
in the case where you have almost no matching results.  There are more
messages to dispatch and handle.


-- 
Tyler Hobbs
DataStax 


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread DuyHai Doan
"It will merge requests to neighboring ranges when the same node is a
replica for both of them.  Without vnodes, this usually results in all
ranges for a node being merged.  With vnodes, merging still happens, but
not all ranges can be merged." -->

 But does it implies that with vnodes, there are actually "extra work" to
do for scanning indices ? If yes, is this "extra load" rather I/O bound or
CPU bound ?

On Fri, Sep 19, 2014 at 11:10 PM, Tyler Hobbs  wrote:

>
> On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel 
> wrote:
>
>>
>> Btw, there is no data in the table. Table is empty. Query is fired on the
>> empty table.
>>
>
> This is actually the worst case for secondary index lookups.
>
>
>>
>> From the tracing ouput, I don't understand why it's doing multiple scans
>> on one node. With non-vnode, there is only one scan per node & same query
>> works fine.
>>
>> If you look at the output1.txt attached earlier, coordinator is firing
>> index scan on a given node (for example, 192.168.51.22 in the below snippet
>> from output1.txt) multiple times for different token ranges. Why can't it
>> fire only one time? With non-vnode, it's only one time & query comes back
>> very fast.
>
>
> It will merge requests to neighboring ranges when the same node is a
> replica for both of them.  Without vnodes, this usually results in all
> ranges for a node being merged.  With vnodes, merging still happens, but
> not all ranges can be merged.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel  wrote:

>
> Btw, there is no data in the table. Table is empty. Query is fired on the
> empty table.
>

This is actually the worst case for secondary index lookups.


>
> From the tracing ouput, I don't understand why it's doing multiple scans
> on one node. With non-vnode, there is only one scan per node & same query
> works fine.
>
> If you look at the output1.txt attached earlier, coordinator is firing
> index scan on a given node (for example, 192.168.51.22 in the below snippet
> from output1.txt) multiple times for different token ranges. Why can't it
> fire only one time? With non-vnode, it's only one time & query comes back
> very fast.


It will merge requests to neighboring ranges when the same node is a
replica for both of them.  Without vnodes, this usually results in all
ranges for a node being merged.  With vnodes, merging still happens, but
not all ranges can be merged.


-- 
Tyler Hobbs
DataStax 


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks folks for all your inputs! Yes, I totally agree that we need to have
a custom column family for indexing. However, we're trying to upgrade our
existing cluster from non-vnode to vnode, and queries using secondary
indexes breaks badly which used to be good with non-vnode.

Btw, there is no data in the table. Table is empty. Query is fired on the
empty table.

>From the tracing ouput, I don't understand why it's doing multiple scans on
one node. With non-vnode, there is only one scan per node & same query
works fine.

If you look at the output1.txt attached earlier, coordinator is firing
index scan on a given node (for example, 192.168.51.22 in the below snippet
from output1.txt) multiple times for different token ranges. Why can't it
fire only one time? With non-vnode, it's only one time & query comes back
very fast.

Executing indexed scan for [min(-9223372036854775808),
max(-9193352069377957523)] | 23:11:30,992 | *192.168.51.22* |
Executing indexed scan for (max(-9193352069377957523),
max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
 Executing indexed scan for (max(-9136021049555745100),
max(-8959555493872108621)] | 23:11:30,999 | *192.168.51.22 *|
 Executing indexed scan for (max(-8959555493872108621),
max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
Executing indexed scan for (max(-8929774302283364912),
max(-8854653908608918942)] | 23:11:31,001 | *192.168.51.22* |


On Fri, Sep 19, 2014 at 9:39 AM, Tyler Hobbs  wrote:

> Jon's advice is definitely still true, but in 2.1 there is
> https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes
> the fetching of ranges.
>
> On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel 
> wrote:
>
>> Agreed.  We only use secondary indexes for column families that are
>> relatively small (~5k rows).  For anything larger, we store the data into a
>> wide row (but this depends on your data model)
>>
>> -Original Message-
>> From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On
>> Behalf Of Jonathan Haddad
>> Sent: Friday, September 19, 2014 4:01 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Slow down of secondary index query with VNODE (C* version
>> 1.2.18, jre6).
>>
>> Keep in mind secondary indexes in cassandra are not there to improve
>> performance, or even really be used in a serious user facing manner.
>>
>> Build and maintain your own view of the data, it'll be much faster.
>>
>>
>>
>> On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel 
>> wrote:
>> > Hi there,
>> >
>> > We are seeing extreme slow down (500ms to 1s) in query on secondary
>> > index with vnode. I'm seeing multiple secondary index scans on a given
>> > node in trace output when vnode is enabled. Without vnode, everything
>> is good.
>> >
>> > Cluster size: 6 nodes
>> > Replication factor: 3
>> > Consistency level: local_quorum. Same behavior happens with
>> > consistency level of ONE.
>> >
>> > Snippet from the trace output. Pls see the attached output1.txt for
>> > the full log. Are we hitting any bug? Do not understand why
>> > coordinator sends requests multiple times to the same node (e.g.
>> > 192.168.51.22 in below
>> > output) for different token ranges.
>> >
>> >>>>
>> >
>> > Executing indexed scan for [min(-9223372036854775808),
>> > max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing
>> > indexed scan for (max(-9193352069377957523),
>> > max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |  Executing
>> > indexed scan for (max(-9136021049555745100),
>> > max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |  Executing
>> > indexed scan for (max(-8959555493872108621),
>> > max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing
>> > indexed scan for (max(-8929774302283364912),
>> > max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |  Executing
>> > indexed scan for (max(-8854653908608918942),
>> > max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
>> >   Executing indexed scan for (max(-8762620856967633953),
>> > max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing
>> > indexed scan for (max(-8668275030769104047),
>> > max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |  Executing
>> > indexed scan for (max(-8659066486210615614),
>> > max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |  Executing
>> > indexed scan for (max(-8419137646248370231),
>> > max(-8416786876632807845)] | 23:11:

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
Jon's advice is definitely still true, but in 2.1 there is
https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes
the fetching of ranges.

On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel 
wrote:

> Agreed.  We only use secondary indexes for column families that are
> relatively small (~5k rows).  For anything larger, we store the data into a
> wide row (but this depends on your data model)
>
> -Original Message-
> From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On
> Behalf Of Jonathan Haddad
> Sent: Friday, September 19, 2014 4:01 AM
> To: user@cassandra.apache.org
> Subject: Re: Slow down of secondary index query with VNODE (C* version
> 1.2.18, jre6).
>
> Keep in mind secondary indexes in cassandra are not there to improve
> performance, or even really be used in a serious user facing manner.
>
> Build and maintain your own view of the data, it'll be much faster.
>
>
>
> On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel  wrote:
> > Hi there,
> >
> > We are seeing extreme slow down (500ms to 1s) in query on secondary
> > index with vnode. I'm seeing multiple secondary index scans on a given
> > node in trace output when vnode is enabled. Without vnode, everything is
> good.
> >
> > Cluster size: 6 nodes
> > Replication factor: 3
> > Consistency level: local_quorum. Same behavior happens with
> > consistency level of ONE.
> >
> > Snippet from the trace output. Pls see the attached output1.txt for
> > the full log. Are we hitting any bug? Do not understand why
> > coordinator sends requests multiple times to the same node (e.g.
> > 192.168.51.22 in below
> > output) for different token ranges.
> >
> >>>>
> >
> > Executing indexed scan for [min(-9223372036854775808),
> > max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing
> > indexed scan for (max(-9193352069377957523),
> > max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |  Executing
> > indexed scan for (max(-9136021049555745100),
> > max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |  Executing
> > indexed scan for (max(-8959555493872108621),
> > max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing
> > indexed scan for (max(-8929774302283364912),
> > max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |  Executing
> > indexed scan for (max(-8854653908608918942),
> > max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
> >   Executing indexed scan for (max(-8762620856967633953),
> > max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing
> > indexed scan for (max(-8668275030769104047),
> > max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |  Executing
> > indexed scan for (max(-8659066486210615614),
> > max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |  Executing
> > indexed scan for (max(-8419137646248370231),
> > max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 |  Executing
> > indexed scan for (max(-8416786876632807845),
> > max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing
> > indexed scan for (max(-8315889933848495185),
> > max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing
> > indexed scan for (max(-8270922890152952193),
> > max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 |  Executing
> > indexed scan for (max(-8260813759533312175),
> > max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 |  Executing
> > indexed scan for (max(-8234845345932129353),
> > max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 |
> >
> > Thanks,
> > Jay
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>


RE: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Parag Patel
Agreed.  We only use secondary indexes for column families that are relatively 
small (~5k rows).  For anything larger, we store the data into a wide row (but 
this depends on your data model) 

-Original Message-
From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Friday, September 19, 2014 4:01 AM
To: user@cassandra.apache.org
Subject: Re: Slow down of secondary index query with VNODE (C* version 1.2.18, 
jre6).

Keep in mind secondary indexes in cassandra are not there to improve 
performance, or even really be used in a serious user facing manner.

Build and maintain your own view of the data, it'll be much faster.



On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel  wrote:
> Hi there,
>
> We are seeing extreme slow down (500ms to 1s) in query on secondary 
> index with vnode. I'm seeing multiple secondary index scans on a given 
> node in trace output when vnode is enabled. Without vnode, everything is good.
>
> Cluster size: 6 nodes
> Replication factor: 3
> Consistency level: local_quorum. Same behavior happens with 
> consistency level of ONE.
>
> Snippet from the trace output. Pls see the attached output1.txt for 
> the full log. Are we hitting any bug? Do not understand why 
> coordinator sends requests multiple times to the same node (e.g. 
> 192.168.51.22 in below
> output) for different token ranges.
>
>>>>
>
> Executing indexed scan for [min(-9223372036854775808), 
> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 | Executing 
> indexed scan for (max(-9193352069377957523), 
> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |  Executing 
> indexed scan for (max(-9136021049555745100), 
> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |  Executing 
> indexed scan for (max(-8959555493872108621), 
> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 | Executing 
> indexed scan for (max(-8929774302283364912), 
> max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |  Executing 
> indexed scan for (max(-8854653908608918942), 
> max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
>   Executing indexed scan for (max(-8762620856967633953), 
> max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 | Executing 
> indexed scan for (max(-8668275030769104047), 
> max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |  Executing 
> indexed scan for (max(-8659066486210615614), 
> max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |  Executing 
> indexed scan for (max(-8419137646248370231), 
> max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 |  Executing 
> indexed scan for (max(-8416786876632807845), 
> max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 | Executing 
> indexed scan for (max(-8315889933848495185), 
> max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 | Executing 
> indexed scan for (max(-8270922890152952193), 
> max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 |  Executing 
> indexed scan for (max(-8260813759533312175), 
> max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 |  Executing 
> indexed scan for (max(-8234845345932129353), 
> max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 |
>
> Thanks,
> Jay
>



--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jonathan Haddad
Keep in mind secondary indexes in cassandra are not there to improve
performance, or even really be used in a serious user facing manner.

Build and maintain your own view of the data, it'll be much faster.



On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel  wrote:
> Hi there,
>
> We are seeing extreme slow down (500ms to 1s) in query on secondary index
> with vnode. I'm seeing multiple secondary index scans on a given node in
> trace output when vnode is enabled. Without vnode, everything is good.
>
> Cluster size: 6 nodes
> Replication factor: 3
> Consistency level: local_quorum. Same behavior happens with consistency
> level of ONE.
>
> Snippet from the trace output. Pls see the attached output1.txt for the full
> log. Are we hitting any bug? Do not understand why coordinator sends
> requests multiple times to the same node (e.g. 192.168.51.22 in below
> output) for different token ranges.
>

>
> Executing indexed scan for [min(-9223372036854775808),
> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
> Executing indexed scan for (max(-9193352069377957523),
> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
>  Executing indexed scan for (max(-9136021049555745100),
> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
>  Executing indexed scan for (max(-8959555493872108621),
> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
> Executing indexed scan for (max(-8929774302283364912),
> max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |
>  Executing indexed scan for (max(-8854653908608918942),
> max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
>   Executing indexed scan for (max(-8762620856967633953),
> max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 |
> Executing indexed scan for (max(-8668275030769104047),
> max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |
>  Executing indexed scan for (max(-8659066486210615614),
> max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |
>  Executing indexed scan for (max(-8419137646248370231),
> max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 |
>  Executing indexed scan for (max(-8416786876632807845),
> max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 |
> Executing indexed scan for (max(-8315889933848495185),
> max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 |
> Executing indexed scan for (max(-8270922890152952193),
> max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 |
>  Executing indexed scan for (max(-8260813759533312175),
> max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 |
>  Executing indexed scan for (max(-8234845345932129353),
> max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 |
>
> Thanks,
> Jay
>



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-18 Thread DuyHai Doan
Hello Jay

Your query is : "select * from keyspaceuser.company_testusers where
lastname = ‘lau’ LIMIT 1"

 Why do you think that the slowness is due to vnodes and not your query
asking for 10 000 results ?

On Fri, Sep 19, 2014 at 3:33 AM, Jay Patel  wrote:

> Hi there,
>
> We are seeing extreme slow down (500ms to 1s) in query on secondary index
> with vnode. I'm seeing multiple secondary index scans on a given node in
> trace output when vnode is enabled. Without vnode, everything is good.
>
> Cluster size: 6 nodes
> Replication factor: 3
> Consistency level: local_quorum. Same behavior happens with consistency
> level of ONE.
>
> Snippet from the trace output. Pls see the attached output1.txt for the
> full log. Are we hitting any bug? Do not understand why coordinator sends
> requests multiple times to the same node (e.g. 192.168.51.22 in below
> output) for different token ranges.
>
> >>>
>
> Executing indexed scan for [min(-9223372036854775808),
> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
> Executing indexed scan for (max(-9193352069377957523),
> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
>  Executing indexed scan for (max(-9136021049555745100),
> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
>  Executing indexed scan for (max(-8959555493872108621),
> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
> Executing indexed scan for (max(-8929774302283364912),
> max(-8854653908608918942)] | 23:11:31,001 | 192.168.51.22 |
>  Executing indexed scan for (max(-8854653908608918942),
> max(-8762620856967633953)] | 23:11:31,002 | 192.168.51.25 |
>   Executing indexed scan for (max(-8762620856967633953),
> max(-8668275030769104047)] | 23:11:31,003 | 192.168.51.22 |
> Executing indexed scan for (max(-8668275030769104047),
> max(-8659066486210615614)] | 23:11:31,003 | 192.168.51.25 |
>  Executing indexed scan for (max(-8659066486210615614),
> max(-8419137646248370231)] | 23:11:31,004 | 192.168.51.22 |
>  Executing indexed scan for (max(-8419137646248370231),
> max(-8416786876632807845)] | 23:11:31,005 | 192.168.51.25 |
>  Executing indexed scan for (max(-8416786876632807845),
> max(-8315889933848495185)] | 23:11:31,006 | 192.168.51.22 |
> Executing indexed scan for (max(-8315889933848495185),
> max(-8270922890152952193)] | 23:11:31,006 | 192.168.51.25 |
> Executing indexed scan for (max(-8270922890152952193),
> max(-8260813759533312175)] | 23:11:31,007 | 192.168.51.22 |
>  Executing indexed scan for (max(-8260813759533312175),
> max(-8234845345932129353)] | 23:11:31,008 | 192.168.51.25 |
>  Executing indexed scan for (max(-8234845345932129353),
> max(-8216636461332030758)] | 23:11:31,008 | 192.168.51.22 |
>
> Thanks,
> Jay
>
>