Luo Chen has posted comments on this change. Change subject: [ASTERIXDB-1946][STO][IDX]Create RTree/InvertedIdx for Correlated Datasets ......................................................................
Patch Set 6: > @Chenluo, not much about this patch on the code side. I want to > update some finding after using this patch on my test data. > > 1. Too many components? > There are a lot more 2ndary components than prefix-policy > generated. In one partition I have 557 inverted index components > (there are 282 primary index components), and half of them are very > tiny. (e.g., 2M). Previously we only have 60 inverted indexes. > 2. Performance is a bit slower than the prefix policy. (?) > I did a simple count test for tweet contains "election" and > "happy". > prefix correlated > election 88s 93s > happy 190s 241.351s > > The performance is slower than prefix policy which contradicts with > our conjecture. Maybe it also related to too many 2dnary > components? > > It's not an objection about this patch. I think we still need to > merge this one to complete the "correlated" policy. After that, we > need more performance test and deeper analysis to improve it in the > future patches. :-) Just make sure you've re-ingested all tweets after this change... What about other secondary BTree and RTree indexes? Currently, the correlated policy allows a secondary index to have more disk components. This is because the merge of inverted index is typically slow, and when the next merge of the primary index is about to be scheduled, the previous merge of the inverted index may not be finished and thus it would be ignored for merge this time. But the flow control mechanism (isMergeLagging method) would ensure this discrepancy is bounded (now it seems to be a bug)... BTW, I think a related problem is that what is the proper range scan method for a LSM index with many disk components. Using a priority queue seems to save some work on sorting, but opening too many files at the same time would have many other side-effects (disturbing seq IOs, too much memory burden etc) -- To view, visit https://asterix-gerrit.ics.uci.edu/1845 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I100fc0b86b8a6fa36a95d77806107bad0307544e Gerrit-PatchSet: 6 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Luo Chen <[email protected]> Gerrit-Reviewer: Ian Maxon <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Luo Chen <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-Reviewer: Yingyi Bu <[email protected]> Gerrit-Reviewer: abdullah alamoudi <[email protected]> Gerrit-HasComments: No
