freemandealer opened a new pull request, #15785:
URL: https://github.com/apache/doris/pull/15785

   …big segment (#14174)
   
   Signed-off-by: freemandealer <[email protected]>
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Check will fail because _segid_statistics_map.find(_num_segcompacted) == 
_segid_statistics_map.end().
   Here the check is ensuring _segid_statistics_map has no existing entry 
indexed by _num_segcompacted.
   
   When will _segid_statistics_map add an entry? The answer is:
   
   after a segment is flushed, or
   after segcompacting, or
   after renaming the big segments which need not segcompact.
   We note a segment that has never been played by segcompaction as 'raw_seg'. 
When these raw segments are compacted, their records will be erased from 
_segid_statistics_map and 'new_seg' (compacted results) will be added to the 
map as a replacement.
   
   For example:
   
   For 7 raw segments 'oooOOoo' ('O' for the big segment while 'o' for the 
small), we break it into four parts: 1) ooo , 2)O, 3)O, 4)oo.
   Group 1 will be compacted to form 'new_seg_1-3', and raw_seg_1, raw_seg_2, 
raw_seg_3 are wiped out.
   Group 2 will be renamed from 'raw_seg_4' to 'new_seg_2' and add it to the 
map.
   Group 3 will be renamed from 'raw_seg_5' to 'new_seg_3' and add it to the 
map.
   Group 4 will be compacted to form 'new_seg_6-7', and raw_seg_6, raw_seg_7 
are wiped out.
   Finally, we rename 'new_seg_1-3' to 'new_seg_1' and 'new_seg_6-7' to 
'new_seg_4'. So we end up having new_seg_1, new_seg_2, new_seg_3, and new_seg_4.
   
   But for those who start with one or more big segments, the problem happens.
   Take 'OOoooo' as an example. We break them into 3 groups: 1) O, 2) O, 3) 
oooo.
   Group 1 will be renamed from 'raw_seg_1' to 'new_seg_1' and add it to the 
map. Coz it is the first segment that is big, filenames get lined up -- src 
filename & dst filename are the same (ignore the raw/new sign that are only 
used to distinguish in this comment).
   
   The case should be carefully handled. We do not need to actually rename it 
but we should count it as handled. If we miss counting (increase 
_num_segcompacted), the following group 2 will still want to be renamed as 
'new_seg_1', but 'new_seg_1' is already in the map, causing the check to fail 
at last.
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to