Luo Chen has posted comments on this change. Change subject: Avoid always merging old components in prefix policy ......................................................................
Patch Set 6: (3 comments) https://asterix-gerrit.ics.uci.edu/#/c/1818/6/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java File hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/PrefixMergePolicy.java: PS6, Line 201: if (mergableIndexes != null) { : return mergableIndexes.getRight() - mergableIndexes.getLeft() + 1; : } else { : return 0; : } > return mergeableIndexes == null? 0: mergableIndexes.getRight() - mergableIn Done PS6, Line 248: for (int i = startIndex; i <= endIndex; i++) { : mergableComponents.add(immutableComponents.get(i)); : } > mergableComponents.addAll(immutableComponents.subList(startIndex, endIndex+ Done PS6, Line 273: private Pair<Integer, Integer> getMergableComponentsIndex(List<ILSMDiskComponent> immutableComponents) > Oh, we shouldn't have the resultFromFlush flag because we don't always have Not sure I fully understand this correctly. But the idea is quite a specialization of the level-based merge policy, where we only merge disk components in the same levels. For example, the newly flush components will be in level 1, components after one round of merge will be in level 2, .... Moreover, disk components are also ordered on levels. The level information could be stored in the component meta-data after flush/merge. Developing a new merge policy probably needs some more time, and definitely needs more experiments to see whether it works better and to understand the side-effect of more extra disk components at each level. I'll probably take a detailed look at this issue from a research prospective next Fall quarter (as I'll be having an summer internship soon). Thus, this fix is more about a temporary fix ("one line" fix as suggested by Mike). In terms of the complexity of finding a mergeable sequence, consider the layout of the disk components. Say after a while, the system now has 100 disk components (ordered by oldest to youngest), then it's almost the case the the first 90 components or so (based on the parameters) are too large and will be ignored by the policy ( and this is also the behavior of the previous prefix policy). The policy will then examine the next 10 components or so, which wouldn't take too much time. -- To view, visit https://asterix-gerrit.ics.uci.edu/1818 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I464da3fed38cded0aee7b319a35664eae069a2ba Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Luo Chen <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Luo Chen <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: Yes
