GitHub user sounakr opened a pull request:
https://github.com/apache/incubator-carbondata/pull/604
[CARBONDATA-691] After Compaction records count are mismatched.
**Problem** : After Compaction record count mismatches with actual count.
**Analysis** :The Partitioning method of compaction was wrong. In
getPartition method of CarbonScanRDD.scala supposed to make a list all the
blocks of all the segments that needs to be merged and then make the partition
based on taskNo. Then each partitioned list is given to each executor. But
currently after partitioning the complete list of blocks are being send to each
executor for merging. As each executors merging all the blocks of all the
segments, multiple executors doubles the data.
**Fix** : Fix the getPartition method logic to process proper list of
blocks to executors.
Fix Horizontal Partitioning which merged with IUD.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sounakr/incubator-carbondata master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/604.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #604
----
commit 4e5ea804a5ab36d79efdb4df425e729245e990ee
Author: sounakr <[email protected]>
Date: 2017-02-17T14:42:39Z
Compaction Partitioning changes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---