[
https://issues.apache.org/jira/browse/ORC-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pavan Lanka reassigned ORC-1027:
--------------------------------
> Filter processing to allow filter injections that cannot be represented via
> SArgs
> ---------------------------------------------------------------------------------
>
> Key: ORC-1027
> URL: https://issues.apache.org/jira/browse/ORC-1027
> Project: ORC
> Issue Type: Improvement
> Components: Java
> Affects Versions: 1.7.0, 1.8.0
> Reporter: Pavan Lanka
> Assignee: Pavan Lanka
> Priority: Major
>
> Currently in the ORCRecordReader the filter logic that perform LazyIO
> receives the following inputs:
> * SearchArgument as passed by the client using
> `Reader.Options.getSearchArgument`
> * Input filter as passed by the client using
> `Reader.Options.getFilterCallback`
> The SearchArgument is particularly convenient in allowing for easy
> integration with the existing engines such as Spark without necessitating any
> code changes on the engine. However this push down is limited to what can be
> represented via SearchArguments as an example if we take any predicate that
> uses a function this cannot be pushed down.
> {quote}SELECT * FROM table WHERE lower(f1) IN ... OR f2 IN ... OR f3 IN ...
> {quote}
> For the above query none of the filters are pushed down to ORC from the
> engine as we have no means for representing Functions and the use of OR to
> combine the multiple predicates.
> An additional input mechanism is requested for supplying filters that is
> plugable without requiring a change in the clients directly. We are proposing
> the use of **ServiceLoader** to dynamically determine the desired filters for
> a given fully qualified file path.
> This filter if determined is applied as an AND in conjunction with the other
> available filters. It is understood that the plugin filter cannot
> differentiate multiple aliases for the same table.
> This generic capability will allow us to represent complex filters that
> currently cannot be pushed down to the storage layer from the existing
> engines allowing us to reap the benefits of LazyIO in many cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)