GitHub user NamanRastogi opened a pull request:
https://github.com/apache/carbondata/pull/3029
[CARBONDATA-3200] No-Sort compaction
When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction
upon, the data still remains unsorted. This does not affect much in query. The
major purpose of compaction, is better pack the data and improve query
performance.
Now, the expected behaviour of compaction is sort to the data, so that
after compaction, query performance becomes better. The columns to sort upon
are provided by SORT_COLUMNS.
- [ ] Any interfaces changed? --> No
- [ ] Any backward compatibility impacted? --> No
- [ ] Document update required? -> No
- [ ] Testing done
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NamanRastogi/carbondata nosort_compaction
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/3029.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3029
----
commit f9e0142149ccd83a48f828bf032842b2a18ce90d
Author: namanrastogi <naman.rastogi.52@...>
Date: 2018-12-27T13:26:18Z
Added HybridSortProcessor
commit d406a9f595558f2f027a56425b0f432b534e47c8
Author: namanrastogi <naman.rastogi.52@...>
Date: 2018-12-21T16:48:15Z
Added flow for HybridSorterProcessor.
TODO: Implement HybridSorterProcessor itself.
----
---