Yingyi Bu has posted comments on this change.

Change subject: Avoid always merging old components in prefix policy
......................................................................


Patch Set 6:

(3 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1818/6/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java
File 
hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java:

PS6, Line 201: if (mergableIndexes != null) {
             :             return mergableIndexes.getRight() - 
mergableIndexes.getLeft() + 1;
             :         } else {
             :             return 0;
             :         }
return mergeableIndexes == null? 0: mergableIndexes.getRight() - 
mergableIndexes.getLeft() + 1;


PS6, Line 248: for (int i = startIndex; i <= endIndex; i++) {
             :             mergableComponents.add(immutableComponents.get(i));
             :         }
mergableComponents.addAll(immutableComponents.subList(startIndex, endIndex+1)) ?


PS6, Line 273:  private Pair<Integer, Integer> 
getMergableComponentsIndex(List<ILSMDiskComponent> immutableComponents) 
It feels that there're some repeated work done in this method. Because this 
method is called for each new component that results from a merge.

I think we potentially can make this method an O(n) operation if we add a 
parameter newComponent into diskComponentAdded(final ILSMIndex index, boolean 
fullMergeIsRequested).  (Note that there're scenarios that we are interested in 
a large number of components, e.g., Cloudberry, CB, for either better read 
perf. or better ingestion perf.).

I'm thinking that maybe  we can different whether the new component is resulted 
from a FLUSH or from MERGE, based on their sizes, for example.  Let's say we 
keep the list of components ordered from younger to older (without reverse):

-- for a new FLUSH-result component, we just need to have a sliding window to 
decide whether it needs to be merged with older FLUSH-result component;  (We 
need to do that because a FLUSH-result component might not be called into this 
method because of line 58.)

-- for a new MERGE-result component Cm, we just need to check its preceding 
component to identify a contiguous mergeable window (if any).


We can take three properties to simplify the mergeable window selection here:

1.  This method is called once per MERGE-result component is added;

2.  A FLUSH-result component should probably only merge with a FLUSH-result 
component;

3.  Whenever a new component, either from FLUSH or MERGE, we only identify a 
mergeable window (with older components) starting from that component.


Thoughts?

Maybe we need an offline discussion.


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1818
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I464da3fed38cded0aee7b319a35664eae069a2ba
Gerrit-PatchSet: 6
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Luo Chen <[email protected]>
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Jianfeng Jia <[email protected]>
Gerrit-Reviewer: Luo Chen <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-Reviewer: abdullah alamoudi <[email protected]>
Gerrit-HasComments: Yes

Reply via email to