GitHub user ajantha-bhat opened a pull request:
https://github.com/apache/carbondata/pull/2345
[wip] Improve Carbon Reader Schema reading performance on S3
Problem : Currently carbon reader is reading schema from carbondata file.
On s3 multiple IO happens as buffer size is small and data file size is big.
Solution: Read schema from index file and do once IO of index file with a
buffer size equal to index file size.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed? NA
- [ ] Any backward compatibility impacted? NA
- [ ] Document update required? NA
- [ ] Testing done
Added UT
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA. NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ajantha-bhat/carbondata master_new
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2345.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2345
----
commit 59b248bbef97010dc2f5dc697400bb2f85799425
Author: ajantha-bhat <ajanthabhat@...>
Date: 2018-05-27T17:19:23Z
Improve Carbon Reader Schema reading on S3
----
---