GitHub user manishgupta88 opened a pull request:
https://github.com/apache/carbondata/pull/1830
[CARBONDATA-2051] Added like query ends with and contains with filter push
down suport to carbondata
**Problem**
Current like filter with start with expression is only pushed down to
carbondata. In case of ends with and contains like filter all the data is given
back to spark and then spark applies the filter on it.
This behavior is fine for the queries which has lesser number of queried
columns. But as the number of columns and data increases there is performance
impact because the data being sent to spark becomes more thereby increasing the
IO.
If like filter is push down then first filter column is read and blocks are
pruned. In this cases the data returned to the spark is after applying the
filter and only blocklets matching the data are fully read. This reduces IO and
increases the query performance.
**Solution**
Modify code to push down like query with ends and contains with filter
- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
No
- [ ] Document update required?
No
- [ ] Testing done
Added test case to verify push down is happening
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/manishgupta88/carbondata like_query_pushdown
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1830.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1830
----
commit bff9fbf316941d0c732c04d3c9b7e775285ec893
Author: manishgupta88 <tomanishgupta18@...>
Date: 2018-01-18T08:53:17Z
Added like query ends with and contains with filter push down suport to
carbondata
----
---