[jira] [Comment Edited] (CASSANDRA-8915) Improve MergeIterator performance

Branimir Lambov (JIRA) Wed, 03 Jun 2015 08:08:08 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569392#comment-14569392
 ]


Branimir Lambov edited comment on CASSANDRA-8915 at 6/3/15 3:00 PM:
--------------------------------------------------------------------

I wasn't completely sure of the correctness of the flagging approach and I 
wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is 
defined by two functions, {{ρ(N)}} which stands for "parent of N", and {{σ(N)}} 
which stands for "equals parent". We will use 'element' and 'node' 
interchangeably as each element E also defines a node using the ρ function with 
parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator = as used below 
denotes element equality (not identity), distinct elements are allowed to be 
equal.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N 
satisfies {{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full 
ep-heaps are ep-heaps leading to ┬, where '┬' is assumed to be a special value 
strictly smaller than any other (thus {{σ(R)}} is always false for the root of 
a full ep-heap). For simplicity we will take all heaps to end at a special 
element called '⊥' whose value is assumed to be greater than any other value.

The critical step here is adding a new value {{C > M}} to combine two existing 
well-formed ep-heaps with roots P and R leading to the same M into a bigger 
single well-formed ep-heap leading to M. Without loss of generality we can take 
{{P ≤ R}} (otherwise we can swap the places of P and R in the discussion below).

We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top by setting 
{{ρ(P), ρ(R) := C}}, {{ρ(C) := M}} and {{σ(C) := false}} and leaving {{σ(P), 
σ(R) := false}}. The ep-heap rooted at C is well-formed leading to M.
# {{C = P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top, setting 
{{ρ(P), ρ(R) := C}}, {{ρ(C) := M}}; {{σ(P) := true}}, {{σ(R) := (P = R)}} and 
{{σ(C) := false}}. The heap rooted at C is well-formed as {{C = P & σ(P)}} as 
well as {{C = P ≤ R}} and {{C = R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P), σ(P)}} can be true or false): Place P at the top: set 
{{ρ(R) := P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave 
{{σ(R)}} unchanged; use the algorithm recursively to merge the two ep-subheaps 
leading to P at its children and the new value {{C > P}} into an ep-heap 
leading to P. Let its root be Q. The necessary conditions for an element of an 
ep-heap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}} 
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} 
and {{σ(R)}} is explicitly set to be true if and only if {{P = R}}. We also 
have {{ρ(P) = M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the 
constructed ep-heap rooted at P leads to M.

The recursion always ends for finite heaps as we must reach the first case due 
to the ordering of ⊥.

For performance the order of comparing elements and identifying the case to use 
can be changed to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of 
{{σ(P)}}/{{σ(R)}} necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of 
{{σ(P)}}/{{σ(R)}} necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each 
element E in the heap {{E = R}} if and only if σ is true for E and the chain of 
parents of E leading to but not including the root. To consume the equal 
elements we can thus walk the sub-tree formed by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For 
each of them:
  -- Advance the element. Let the new value be C.
  -- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite 
{{ρ(P)}} and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
  -- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
  -- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is 
reached.

\* The algorithm never actually uses the values of the parent function ρ, only 
the fact that σ is set correctly with respect to that parent. This means this 
step does not need to be actually performed.
{panel}

Updated [the 
branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
 to use this solution and cleaned the code a bit.


was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I 
wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is 
defined by two functions, {{ρ(N)}} which stands for "parent of N", and {{σ(N)}} 
which stands for "equals parent". We will use 'element' and 'node' 
interchangeably as each element E also defines a node using the ρ function with 
parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator = as used below 
denotes element equality (not identity), distinct elements are allowed to be 
equal.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N 
satisfies {{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full 
ep-heaps are ep-heaps leading to ┬, where '┬' is assumed to be a special value 
strictly smaller than any other (thus {{σ(R)}} is always false for the root of 
a full ep-heap). For simplicity we will take all heaps to end at a special 
element called '⊥' whose value is assumed to be greater than any other value.

The critical step here is adding a new value {{C > M}} to combine two existing 
well-formed ep-heaps with roots P and R leading to the same M into a bigger 
single well-formed ep-heap leading to M. Without loss of generality we can take 
{{P ≤ R}} (otherwise we can swap the places of P and R in the discussion below).

We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P), 
ρ(R) := C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that 
{{σ(R)}} cannot be true as that contradicts with {{C > M}}). {{σ(C) := false}}. 
The ep-heap rooted at C is well-formed leading to M.
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P), 
ρ(R) := C}}, {{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) := 
false}}. The heap rooted at C is well-formed as {{C = P & σ(P)}} as well as {{C 
= P ≤ R}} and {{C = R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) := 
P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}} 
unchanged; use the algorithm recursively to merge the two ep-subheaps leading 
to P at its children and the new value {{C > P}} into an ep-heap leading to P. 
Let its root be Q. The necessary conditions for an element of an ep-heap hold 
for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}} in which case 
{{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}} 
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) = 
M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the constructed ep-heap 
rooted at P leads to M.

The recursion always ends for finite heaps as we must reach the first case due 
to the ordering of ⊥.

For performance the order of comparing elements and identifying the case to use 
can be changed to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of 
{{σ(P)}}/{{σ(R)}} necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of 
{{σ(P)}}/{{σ(R)}} necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each 
element E in the heap {{E = R}} if and only if σ is true for E and the chain of 
parents of E leading to but not including the root. To consume the equal 
elements we can thus walk the sub-tree formed by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For 
each of them:
  -- Advance the element. Let the new value be C.
  -- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite 
{{ρ(P)}} and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
  -- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
  -- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is 
reached.

\* The algorithm never actually uses the values of the parent function ρ, only 
the fact that σ is set correctly with respect to that parent. This means this 
step does not need to be actually performed.
{panel}

Updated [the 
branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
 to use this solution and cleaned the code a bit.

> Improve MergeIterator performance
> ---------------------------------
>
>                 Key: CASSANDRA-8915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: compaction, performance
>             Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a 
> pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
> This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
> comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
> for example in the case where the inputs largely don't overlap (where {{N}} 
> is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform 
> replacement of the top of the queue in a single step, which will very often 
> complete after a couple of comparisons and in the worst case scenarios will 
> match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with 
> limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8915) Improve MergeIterator performance

Reply via email to