[jira] Updated: (CASSANDRA-821) get_range_slice performance

Johan Oskarsson (JIRA) Thu, 11 Mar 2010 05:14:54 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Johan Oskarsson updated CASSANDRA-821:
--------------------------------------

    Attachment: CASSANDRA-821.patch

This patch is very much a work in progress, comments are welcome.

The main objective is to remove the two loops in the operation into one. The 
current code reads the keys first and then fetches the data key by key, opening 
and closing files for each in the process. This patch moves that into one loop 
and shares the open files.

I have compared the results from stress.py against the patched version and a 
clean trunk with impressive results. However that was just on my desktop and I 
had a hard time getting results at all without timeouts, so won't post any 
results until I (or someone else) can verify on a better setup.

Implementation details:
 * Renames IteratingRow into SSTableIteratingRow and changes the former into an 
interface, so that we can handle the memtables in a similar fashion.
 * Adds a new constructor to SSTableSliceIterator and SSTableNamesIterator to 
accept an already open file.
 * Bulk of the changes are in the getKeyRange method to adapt to the new way of 
reading data.
 * Have not looked at hooking up the caching layer yet
 * Passes all tests, but I suspect there will be bugs and things I have missed.



> get_range_slice performance
> ---------------------------
>
>                 Key: CASSANDRA-821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-821
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: CASSANDRA-821.patch
>
>
> get_range_slice performance isn't that great.  CASSANDRA-799 helped in the 
> case when the memtable isn't flushed, but overall the operations per second 
> and latency is still pretty bad.  On a quad core node with a trivial amount 
> of data I see around 130-150 ops per second with stress.py set to slice 100 
> keys at a time, and latency is 300-500ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-821) get_range_slice performance

Reply via email to