[
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569392#comment-14569392
]
Branimir Lambov edited comment on CASSANDRA-8915 at 6/3/15 3:00 PM:
--------------------------------------------------------------------
I wasn't completely sure of the correctness of the flagging approach and I
wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is
defined by two functions, {{ρ(N)}} which stands for "parent of N", and {{σ(N)}}
which stands for "equals parent". We will use 'element' and 'node'
interchangeably as each element E also defines a node using the ρ function with
parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator = as used below
denotes element equality (not identity), distinct elements are allowed to be
equal.
We define "ep-heap leading to M" to be a binary tree of nodes where each node N
satisfies {{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full
ep-heaps are ep-heaps leading to ┬, where '┬' is assumed to be a special value
strictly smaller than any other (thus {{σ(R)}} is always false for the root of
a full ep-heap). For simplicity we will take all heaps to end at a special
element called '⊥' whose value is assumed to be greater than any other value.
The critical step here is adding a new value {{C > M}} to combine two existing
well-formed ep-heaps with roots P and R leading to the same M into a bigger
single well-formed ep-heap leading to M. Without loss of generality we can take
{{P ≤ R}} (otherwise we can swap the places of P and R in the discussion below).
We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top by setting
{{ρ(P), ρ(R) := C}}, {{ρ(C) := M}} and {{σ(C) := false}} and leaving {{σ(P),
σ(R) := false}}. The ep-heap rooted at C is well-formed leading to M.
# {{C = P}} (necessarily {{¬σ(P), ¬σ(R)}}): We place C at the top, setting
{{ρ(P), ρ(R) := C}}, {{ρ(C) := M}}; {{σ(P) := true}}, {{σ(R) := (P = R)}} and
{{σ(C) := false}}. The heap rooted at C is well-formed as {{C = P & σ(P)}} as
well as {{C = P ≤ R}} and {{C = R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P), σ(P)}} can be true or false): Place P at the top: set
{{ρ(R) := P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave
{{σ(R)}} unchanged; use the algorithm recursively to merge the two ep-subheaps
leading to P at its children and the new value {{C > P}} into an ep-heap
leading to P. Let its root be Q. The necessary conditions for an element of an
ep-heap hold for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}}
in which case {{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}}
and {{σ(R)}} is explicitly set to be true if and only if {{P = R}}. We also
have {{ρ(P) = M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the
constructed ep-heap rooted at P leads to M.
The recursion always ends for finite heaps as we must reach the first case due
to the ordering of ⊥.
For performance the order of comparing elements and identifying the case to use
can be changed to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of
{{σ(P)}}/{{σ(R)}} necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of
{{σ(P)}}/{{σ(R)}} necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.
Once an ep-heap with root R is formed, it trivially follows that for each
element E in the heap {{E = R}} if and only if σ is true for E and the chain of
parents of E leading to but not including the root. To consume the equal
elements we can thus walk the sub-tree formed by the children which have σ set.
To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For
each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite
{{ρ(P)}} and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
-- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is
reached.
\* The algorithm never actually uses the values of the parent function ρ, only
the fact that σ is set correctly with respect to that parent. This means this
step does not need to be actually performed.
{panel}
Updated [the
branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.
was (Author: blambov):
I wasn't completely sure of the correctness of the flagging approach and I
wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below a set of ep-heaps over some space of elements is
defined by two functions, {{ρ(N)}} which stands for "parent of N", and {{σ(N)}}
which stands for "equals parent". We will use 'element' and 'node'
interchangeably as each element E also defines a node using the ρ function with
parent {{ρ(E)}} and children {{\{x: ρ(X) ≡ E\}}}. The operator = as used below
denotes element equality (not identity), distinct elements are allowed to be
equal.
We define "ep-heap leading to M" to be a binary tree of nodes where each node N
satisfies {{ρ(N) ≤ N & σ(N) ↔ ρ(N) = N}} and the root has M as its parent. Full
ep-heaps are ep-heaps leading to ┬, where '┬' is assumed to be a special value
strictly smaller than any other (thus {{σ(R)}} is always false for the root of
a full ep-heap). For simplicity we will take all heaps to end at a special
element called '⊥' whose value is assumed to be greater than any other value.
The critical step here is adding a new value {{C > M}} to combine two existing
well-formed ep-heaps with roots P and R leading to the same M into a bigger
single well-formed ep-heap leading to M. Without loss of generality we can take
{{P ≤ R}} (otherwise we can swap the places of P and R in the discussion below).
We distinguish three cases:
# {{C < P}} (necessarily {{¬σ(P)}}): We place C at the top by setting {{ρ(P),
ρ(R) := C}}, {{ρ(C) := M}} and leaving {{σ(P), σ(R) := false}} (Note that
{{σ(R)}} cannot be true as that contradicts with {{C > M}}). {{σ(C) := false}}.
The ep-heap rooted at C is well-formed leading to M.
# {{C = P}} (necessarily {{¬σ(P)}}): We place C at the top, setting {{ρ(P),
ρ(R) := C}}, {{ρ(C) := M}}, {{σ(P) := true}} and {{σ(R) := (P = R)}}. {{σ(C) :=
false}}. The heap rooted at C is well-formed as {{C = P & σ(P)}} as well as {{C
= P ≤ R}} and {{C = R}} if and only if {{σ(R)}}.
# {{C > P}} ({{σ(P)}} can be true or false): Place P at the top: set {{ρ(R) :=
P}} and if {{¬σ(P)}}, set {{σ(R) := (P = R)}}, otherwise leave {{σ(R)}}
unchanged; use the algorithm recursively to merge the two ep-subheaps leading
to P at its children and the new value {{C > P}} into an ep-heap leading to P.
Let its root be Q. The necessary conditions for an element of an ep-heap hold
for Q. They are also true for R as {{P ≤ R}} and either {{σ(P)}} in which case
{{P = M}} and {{σ(R)}} if and only if {{R = M = P}}, or {{¬σ(P)}} and {{σ(R)}}
is explicitly set to be true if and only if {{P = R}}. We also have {{ρ(P) =
M}} as well as {{P = M}} if and only if {{σ(P)}}, thus the constructed ep-heap
rooted at P leads to M.
The recursion always ends for finite heaps as we must reach the first case due
to the ordering of ⊥.
For performance the order of comparing elements and identifying the case to use
can be changed to sequentially checking:
- {{σ(P)}}: apply pt. 3 for P and return (P must be ≤ R). No update of
{{σ(P)}}/{{σ(R)}} necessary.
- {{σ(R)}}: apply pt. 3 for R and return (R must now be < P). No update of
{{σ(P)}}/{{σ(R)}} necessary.
- compare P and R.
- If {{R < P}}, perform the next steps with P and R swapped.
- compare C and P.
- if {{C < P}} apply pt. 1.
- if {{C = P}} apply pt. 2.
- if {{C > P}} apply pt. 3.
Once an ep-heap with root R is formed, it trivially follows that for each
element E in the heap {{E = R}} if and only if σ is true for E and the chain of
parents of E leading to but not including the root. To consume the equal
elements we can thus walk the sub-tree formed by the children which have σ set.
To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For
each of them:
-- Advance the element. Let the new value be C.
-- The children P, R of the element must have {{¬σ(P)}}, {{¬σ(R)}}. Overwrite
{{ρ(P)}} and {{ρ(R)}} with ┬*. P and R now form ep-heaps leading to ┬.
-- Use the algorithm to combine P, R and C into a new ep-heap leading to ┬.
-- Let its root be Q. Because the heap leads to ┬ we have {{¬σ(Q)}}.
- Continue with the next deepest level of the hierarchy until the top is
reached.
\* The algorithm never actually uses the values of the parent function ρ, only
the fact that σ is set correctly with respect to that parent. This means this
step does not need to be actually performed.
{panel}
Updated [the
branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
to use this solution and cleaned the code a bit.
> Improve MergeIterator performance
> ---------------------------------
>
> Key: CASSANDRA-8915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Priority: Minor
> Labels: compaction, performance
> Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a
> pair of {{poll}}+{{add}} operations for every item in the resulting sequence.
> This is quite inefficient as {{poll}} necessarily applies at least {{log N}}
> comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}},
> for example in the case where the inputs largely don't overlap (where {{N}}
> is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform
> replacement of the top of the queue in a single step, which will very often
> complete after a couple of comparisons and in the worst case scenarios will
> match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with
> limited overlap (e.g. levelled compaction).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)