GitHub user dhatchayani reopened a pull request:

    https://github.com/apache/carbondata/pull/1702

    [CARBONDATA-1896] Clean files operation improvement

    **Problem:**
    When bringing up the session, clean operation is handled in a way to mark 
all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to 
MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering 
the other parallel sessions. If any other session's data load is IN_PROGRESS at 
the time of bringing up one session, then the executing load also will be 
changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling 
stale segments cleaning while session bring up also increases the time of 
bringing up a session.
    
    **Solution:**
    SEGMENT_LOCK should be taken on the new segment while loading.
    While cleaning segments tablestatus file and SEGMENT_LOCK should be 
considered.
    Cleaning stale files while bringing up the session should be removed and 
this can be either manually done on the needed tables through already existing 
CLEAN FILES DDL or the next load on the table will clean the same.
    
    
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [x] Testing done
            Manual Testing
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhatchayani/incubator-carbondata clean_files

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1702
    
----
commit 4573f5fbcc7d0414323513e8746f9050f9eb1e78
Author: dhatchayani <dhatcha.official@...>
Date:   2017-12-20T17:05:31Z

    [CARBONDATA-1896] Clean files operation improvement

----


---

Reply via email to