freemandealer opened a new pull request, #15785: URL: https://github.com/apache/doris/pull/15785
…big segment (#14174) Signed-off-by: freemandealer <[email protected]> # Proposed changes Issue Number: close #xxx ## Problem summary Check will fail because _segid_statistics_map.find(_num_segcompacted) == _segid_statistics_map.end(). Here the check is ensuring _segid_statistics_map has no existing entry indexed by _num_segcompacted. When will _segid_statistics_map add an entry? The answer is: after a segment is flushed, or after segcompacting, or after renaming the big segments which need not segcompact. We note a segment that has never been played by segcompaction as 'raw_seg'. When these raw segments are compacted, their records will be erased from _segid_statistics_map and 'new_seg' (compacted results) will be added to the map as a replacement. For example: For 7 raw segments 'oooOOoo' ('O' for the big segment while 'o' for the small), we break it into four parts: 1) ooo , 2)O, 3)O, 4)oo. Group 1 will be compacted to form 'new_seg_1-3', and raw_seg_1, raw_seg_2, raw_seg_3 are wiped out. Group 2 will be renamed from 'raw_seg_4' to 'new_seg_2' and add it to the map. Group 3 will be renamed from 'raw_seg_5' to 'new_seg_3' and add it to the map. Group 4 will be compacted to form 'new_seg_6-7', and raw_seg_6, raw_seg_7 are wiped out. Finally, we rename 'new_seg_1-3' to 'new_seg_1' and 'new_seg_6-7' to 'new_seg_4'. So we end up having new_seg_1, new_seg_2, new_seg_3, and new_seg_4. But for those who start with one or more big segments, the problem happens. Take 'OOoooo' as an example. We break them into 3 groups: 1) O, 2) O, 3) oooo. Group 1 will be renamed from 'raw_seg_1' to 'new_seg_1' and add it to the map. Coz it is the first segment that is big, filenames get lined up -- src filename & dst filename are the same (ignore the raw/new sign that are only used to distinguish in this comment). The case should be carefully handled. We do not need to actually rename it but we should count it as handled. If we miss counting (increase _num_segcompacted), the following group 2 will still want to be renamed as 'new_seg_1', but 'new_seg_1' is already in the map, causing the check to fail at last. Describe your changes. ## Checklist(Required) 1. Does it affect the original behavior: - [ ] Yes - [ ] No - [ ] I don't know 2. Has unit tests been added: - [ ] Yes - [ ] No - [ ] No Need 3. Has document been added or modified: - [ ] Yes - [ ] No - [ ] No Need 4. Does it need to update dependencies: - [ ] Yes - [ ] No 5. Are there any changes that cannot be rolled back: - [ ] Yes (If Yes, please explain WHY) - [ ] No ## Further comments If this is a relatively large or complex change, kick off the discussion at [[email protected]](mailto:[email protected]) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
