GitHub user kevinjmh opened a pull request:

    https://github.com/apache/carbondata/pull/3023

    [CARBONDATA-3197][BloomDataMap] Merge bloom index before accessible

    **Problem**
    Currently carbon allows to query when bloom index files are merging, but 
this will cause problems when the index files state change from multiple shards 
to merged shard.
    
    Timeline to explain problem:
    - load data for table with bloom datamap, data is loaded, bloom index files 
are generated along loading, bloom index file merging is under action
    - query fired
    - `BloomCoarseGrainDataMapFactory.getAllShardPaths` found multiple shards, 
and bloom index file merging in progress, so `BloomCoarseGrainDataMap` with 
detailed shard name created
    - bloom index file merging done, folders with detailed shard name are 
deleted
    - Exception will occur when `BloomCoarseGrainDataMap` wants to read bloom 
index file from  folders with detailed shard name to prune
    
    **Analyse**
    Root cause is that we allow query on datamap which is not in stable state. 
one solution is to disable datamap when merging bloom index file,  but this 
will affect all the segments many times. Another solution is to take the bloom 
index files merging as part of loading, such that query can not access unstable 
bloom index files until it is ready
    
    **Solution**
    Change the events to watch for `MergeBloomIndexEventListener`, do the 
merging staff before segment status is updated for access
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests 
are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance 
test report.
            - Any additional information to help reviewers in testing this 
change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinjmh/carbondata mergeBloomIndexEvent

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/3023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3023
    
----
commit 881a510de38e30cfa4ae8a84be1f003c6254d9ab
Author: Manhua <kevinjmh@...>
Date:   2018-12-25T08:21:40Z

    merge bloom index before accessible

----


---

Reply via email to