GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/1715

    [CARBONDATA-1934] Incorrect results are returned by select query in case 
when the number of blocklets for one part file are > 1 in the same task

    Problem: When a select query is triggered, driver will prune the segments 
and give a list of blocklets that need to be scanned. The number of tasks from 
spark will be equal to the number of blocklets identified.
    In case where one task has more than one blocklet for same file, then 
BlockExecution getting formed is incorrect. Due to this the query results are 
incorrect.
    
    Fix: Use the abstract index to fill all the details in BlockExecutionInfo
    
     - [ ] Any interfaces changed?
    No 
     - [ ] Any backward compatibility impacted?
    No
     - [ ] Document update required?
    No
     - [ ] Testing done
     Manual testing
     - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata data_loss_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1715
    
----
commit b0c518d4aa7d4b2387899deefc0f9ed39b5c463c
Author: manishgupta88 <tomanishgupta18@...>
Date:   2017-12-22T10:35:31Z

    Incorrect results are returned by select query in case when the number of 
blocklets for one part file are > 1 in the same task

----


---

Reply via email to