GitHub user jackylk opened a pull request:
https://github.com/apache/incubator-carbondata/pull/242
[CARBONDATA-318] Implement an InMemory Sorter that makes maximum usage of
memory for data load
Changed as following:
1. Change SortDataRows.java to keep rows and sort in memory if memory is
sufficient, otherwise spill to disk.
2. Change SortKeyStep and MdkeyGenStep to support both in memory sort and
merge sort.
To choose between these two approaches, user can set SORT_SIZE in carbon
property, like set it to 3 million rows:
```
// Number of rows to keep in memory when loading data, if number of input
row exceeds this value,
// carbon will use merge sort instead of in memory sort
CarobonPropery.getInstance().addProperty(CarbonCommonConstants.SORT_SIZE,
"3000000")
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata in-memory-sort
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/242.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #242
----
commit 23d9fbbca33cd5bb56a0a648596ad4d61c32fa04
Author: jackylk <[email protected]>
Date: 2016-10-15T15:53:16Z
add in memory sort in data load
commit 45dae7c4c7b535540bafb15bcc896061cfae7ca7
Author: jackylk <[email protected]>
Date: 2016-10-15T17:55:09Z
fix empty row
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---