[jira] [Updated] (CASSANDRA-7805) Performance regression in multi-get (in clause) due to automatic paging

Philip Thompson (JIRA) Wed, 18 Mar 2015 14:03:57 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Philip Thompson updated CASSANDRA-7805:
---------------------------------------
    Fix Version/s: 2.0.14

> Performance regression in multi-get (in clause) due to automatic paging
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-7805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7805
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: J.B. Langston
>            Priority: Minor
>             Fix For: 2.0.14
>
>
> Comparative benchmarking of 1.2 vs. 2.0 shows a regression in multi-get (in 
> clause) queries due to automatic paging.  Take the following example:
> select myId, col1, col2, col3 from myTable where col1 = 'xyz' and myId IN 
> (id1, id1, ..., id100); // primary key is (myId, col1)
> We were suprised to see that in 2.0, the above query was giving an order of 
> magnitude worse performance than in 1.2. Digging in, I believe it is due to 
> the issue described in the comment at the top of MultiPartitionPager.java 
> (v2.0.9): "Note that this is not easy to make efficient. Indeed, we need to 
> page the first command fully before returning results from the next one, but 
> if the result returned by each command is small (compared to pageSize), 
> paging the commands one at a time under-performs compared to parallelizing."
> The perf regression is due to the new paging feature in 2.0. The server is 
> executing the read for each id in the IN clause *sequentially* in order to 
> implement the paging semantics.
> The wisdom of using multi-get like this has been debated in other forums, but 
> the thing that's unfortunate from a user point of view, is if they had a 
> bunch of code working against 1.2 and then they upgrade their cluster to 2.0 
> and all of a sudden start to see an order of magnitude or worse perf 
> regression. That will be perceived as a problem. I think it would surprise 
> anyone not familiar with the code that the separate reads for the IN clause 
> would be done sequentially rather than in parallel.
> As a workaround, disable paging in the Java driver by setting fetchSize to 
> Integer.MAX_VALUE on your QueryOptions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7805) Performance regression in multi-get (in clause) due to automatic paging

Reply via email to