[ 
https://issues.apache.org/jira/browse/CASSANDRA-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-2062:
--------------------------------

    Attachment: 0005-Remove-temporary-instrumentation-and-CollatingIterator.txt
                0004-Port-LazilyCompactedRow-ReducingKeyIterator-RangeSlice.txt

* 0001 - Adds MergeIterator, which uses a priority queue to perform a 
{{O(2*log(M)*N)}} merge in the same space. Also guarantees to only call hasNext 
once per item.
* 0002 - Adds some debug output for checking the number of comparisons 
performed during compaction
* 0003 - Replaces CollatingIterator for Compaction
* 0004 - Replaces CollatingIterator elsewhere
* 0005 - Removes debug output and CollatingIterator from FBUtilities

The number of comparisons needed to compact {{M=sstables=100}}, 
{{N=rows=80000}} drops from ~8 million ({{M*N}}) to ~1 million 
({{2*log(M)*N}}), as expected.
{noformat}# MergeIterator
1 for org.apache.cassandra.io.CompactionIterator@7dc21ece
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 
colsper=200000: 2758 ms
399999 for org.apache.cassandra.io.CompactionIterator@20ca5bff
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 
colsper=1: 4504 ms
917546 for org.apache.cassandra.io.CompactionIterator@6360f5bf
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 
colsper=5: 1021 ms


# CollatingIterator
1 for org.apache.cassandra.io.CompactionIterator@5f873eb2
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=1 
colsper=200000: 1732 ms
399999 for org.apache.cassandra.io.CompactionIterator@357c7988
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=2 rowsper=200000 
colsper=1: 4499 ms
7915050 for org.apache.cassandra.io.CompactionIterator@2ae0420b
org.apache.cassandra.db.LongCompactionSpeedTest: sstables=100 rowsper=800 
colsper=5: 1427 ms{noformat}

> Use more efficient merge algorithm
> ----------------------------------
>
>                 Key: CASSANDRA-2062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2062
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 0.7.2
>
>         Attachments: 0001-Improved-iterator-for-merging-sorted-iterators.txt, 
> 0002-Quickie-instrumentation-for-comparisons.txt, 
> 0003-Replace-Collating-with-Merge-in-CompactionIterator.txt, 
> 0004-Port-LazilyCompactedRow-ReducingKeyIterator-RangeSlice.txt, 
> 0005-Remove-temporary-instrumentation-and-CollatingIterator.txt
>
>
> For {{M}} iterators containing {{N}} total items, 
> commons.collections.CollatingIterator performs a {{O(M*N)}} merge, and calls 
> hasNext multiple times per returned value. We can do better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to