Yingyi Bu has posted comments on this change. Change subject: Avoid always merging old components in prefix policy ......................................................................
Patch Set 6: (3 comments) https://asterix-gerrit.ics.uci.edu/#/c/1818/6/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java: PS6, Line 201: if (mergableIndexes != null) { : return mergableIndexes.getRight() - mergableIndexes.getLeft() + 1; : } else { : return 0; : } return mergeableIndexes == null? 0: mergableIndexes.getRight() - mergableIndexes.getLeft() + 1; PS6, Line 248: for (int i = startIndex; i <= endIndex; i++) { : mergableComponents.add(immutableComponents.get(i)); : } mergableComponents.addAll(immutableComponents.subList(startIndex, endIndex+1)) ? PS6, Line 273: private Pair<Integer, Integer> getMergableComponentsIndex(List<ILSMDiskComponent> immutableComponents) It feels that there're some repeated work done in this method. Because this method is called for each new component that results from a merge. I think we potentially can make this method an O(n) operation if we add a parameter newComponent into diskComponentAdded(final ILSMIndex index, boolean fullMergeIsRequested). (Note that there're scenarios that we are interested in a large number of components, e.g., Cloudberry, CB, for either better read perf. or better ingestion perf.). I'm thinking that maybe we can different whether the new component is resulted from a FLUSH or from MERGE, based on their sizes, for example. Let's say we keep the list of components ordered from younger to older (without reverse): -- for a new FLUSH-result component, we just need to have a sliding window to decide whether it needs to be merged with older FLUSH-result component; (We need to do that because a FLUSH-result component might not be called into this method because of line 58.) -- for a new MERGE-result component Cm, we just need to check its preceding component to identify a contiguous mergeable window (if any). We can take three properties to simplify the mergeable window selection here: 1. This method is called once per MERGE-result component is added; 2. A FLUSH-result component should probably only merge with a FLUSH-result component; 3. Whenever a new component, either from FLUSH or MERGE, we only identify a mergeable window (with older components) starting from that component. Thoughts? Maybe we need an offline discussion. -- To view, visit https://asterix-gerrit.ics.uci.edu/1818 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I464da3fed38cded0aee7b319a35664eae069a2ba Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Luo Chen <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Luo Chen <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: Yes
