jack86596 opened a new pull request #4232:
URL: https://github.com/apache/carbondata/pull/4232


    ### Why is this PR needed?
   Currently clean files command will delete all the Marked for Deleted and 
Compacted segments after the number of theses segments reaches 
carbon.invisible.segments.preserve.count, this delete operation may take lots 
of time and user cannot decide to only delete some of these segments. It is 
better to enhance clean files command to allow specify the segments to be 
deleted.
    
    ### What changes were proposed in this PR?
   1. Clean files command supports specify segment ids, syntax is "clean files 
for table table_name options("segment_ids"="id1,id2,id3...")". If specified 
segment ids, then only the segment with these ids will be delete physically.
   2. Refactoring lock taken: during clean files, take the tablestatus lock at 
the begining and release the lock at the end, and during lock taken period, 
only read tablestatus file one time(before there could be 10+) and all 
operations are done on it like change the visibility of segment, move 
visibility = false segment to tablestatus.history file.
       
    ### Does this PR introduce any user interface change?
    - Yes. One more option is added for clean files command: segment_ids. Value 
is the segment ids user wants to delete. Only Marked for Delete and Compacted 
segment ids are valid. If invalid ids are given, operation will fail directly. 
If segments are specified, force option will be ignored.
   CLEAN FILES FOR TABLE TABLE_NAME options('segment_ids'='0,1,2')
   
    ### Is any new testcase added?
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to