GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/2531
[HOTFIX] Improved BlockDataMap caching performance during first time query
Things done as part of this PR
1. Created taskSumamry and FileFooterEntry schema once and stored in member
variable. Everytime creation of schema was a costly operation as time to prune
dataMaps increased because of that.
2. Used TreeMap instead of HashMap while adding the complete file path and
data to the map diring merge file read. Using TreeMap improved the map filling
performance by 10 sec for 1200 entries.
- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
No
- [ ] Document update required?
No
- [ ] Testing done
Verified manually
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishgupta88/carbondata query_perf
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2531.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2531
----
commit 26954b88d606535349f83f80a3e00f9b2db4fd66
Author: manishgupta88 <tomanishgupta18@...>
Date: 2018-07-19T13:45:12Z
Code modification done to improve query performance
----
---