Luo Chen has posted comments on this change.

Change subject: Avoid always merging old components in prefix policy
......................................................................


Patch Set 7:

> In terms of finding what components can be merged, we are almost
 > the same -- it's an approximation of level-based merge.
 > 
 > >> >100M, >100M , ..., >100M, 61M, 25M, 11M, 5M, 1M, 1M, 1M, 1M
 > 
 > That's probably not the worst adversary case.  Suppose
 > MaxMergableSize = 100M, MaxToleranceCount = 5, the worst adversary
 > case that I can think of is:
 > 
 > C1(100M), C2(20M), C3(20M), C4(20M), C5(20M), C6(4M), C7(4M),
 > C8(4M), C9(4M), C10(0.75M), C11(0.75M), C12(0.75M), C13 (0.75M)
 > 
 > When C13 is added (from a FLUSH), the algorithm probably doesn't
 > need to search a merge window from C2?
 > When C4 is added (from a MERGE), the algorithm doesn't need to
 > check any components after C4?  In addition, it only needs to check
 > a merge-able window ending at C4.  At the moment, if there is a
 > mergeable window ending at C3, that should be identified when C3 is
 > added.
 > 
 > My point is that:
 > For merge-resulting C, a merge should only be triggered by the
 > *last* component in the merge-able window.  For example, when C4 is
 > added, you only need to search backward from C4.
 > 
 > For flush-resulting C, it might not be the case (because "finding
 > mergeable window" can be skipped if there's an ongoing merge), but
 > it's easy to tackle.  For example, when there's C13 is added, you
 > just need to search forward from C10 instead C2.

Got it. However, this sequence is not a valid layout under the this policy. 
Suppose C6 is added:

C1(100M), C2(20M), C3(20M), C4(20M), C5(20M), C6(4M)

then C2 - C6 would be merged since 5 >= MaxToleranceCount and C2.size < 1.2 * 
(C3.size + ...+ C6.size), and the result would be: C1(100M), C2'(84M). The same 
argument applies to C6(4M), C7(4M), C8(4M), C9(4M), C10(0.75M).

This improvement would be helpful if we adopt the level-based policy, where we 
could have multiple disk components with similar sizes at each level. However, 
for this prefix policy (a variation based on HBase), the component size grows 
exponentially except the newly flushed ones (otherwise, two components with 
similar sizes would be merged in the next round because of the ratio). Thus, I 
still think this improvement wouldn't bring too much benefit for the current 
policy, and here (as well as HBase) actually allows a component to be merged 
with both older and younger components.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1818
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I464da3fed38cded0aee7b319a35664eae069a2ba
Gerrit-PatchSet: 7
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Luo Chen <[email protected]>
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Luo Chen <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-Reviewer: abdullah alamoudi <[email protected]>
Gerrit-HasComments: No

Reply via email to