[ 
https://issues.apache.org/jira/browse/ORC-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated ORC-1027:
-------------------------------
    Affects Version/s:     (was: 1.7.0)

> Filter processing to allow filter injections that cannot be represented via 
> SArgs
> ---------------------------------------------------------------------------------
>
>                 Key: ORC-1027
>                 URL: https://issues.apache.org/jira/browse/ORC-1027
>             Project: ORC
>          Issue Type: Improvement
>          Components: Java
>    Affects Versions: 1.8.0
>            Reporter: Pavan Lanka
>            Assignee: Pavan Lanka
>            Priority: Major
>
> Currently in the ORCRecordReader the filter logic that perform LazyIO 
> receives the following inputs:
>  * SearchArgument as passed by the client using 
> `{color:#ff0000}Reader.Options.getSearchArgument{color}`
>  * Input filter as passed by the client using 
> `{color:#ff0000}Reader.Options.getFilterCallback{color}`
> The SearchArgument is particularly convenient in allowing for easy 
> integration with the existing engines such as Spark without necessitating any 
> code changes on the engine. However this push down is limited to what can be 
> represented via SearchArguments as an example if we take any predicate that 
> uses a function this cannot be pushed down.
> {quote}SELECT * FROM table WHERE {color:#ff0000}lower{color}(f1) IN ... 
> {color:#FF0000}OR{color} f2 IN ... {color:#FF0000}OR{color} f3 IN ...
> {quote}
> For the above query none of the filters are pushed down to ORC from the 
> engine as we have no means for representing Functions and the use of OR to 
> combine the predicates.
> An additional input mechanism is requested for supplying filters that is 
> plugable without requiring a change in the clients directly. We are proposing 
> the use of Java **ServiceLoader** to dynamically determine the desired 
> filters for a given fully qualified file path.
> This filter if determined is applied as an AND in conjunction with the other 
> available filters. It is understood that the plugin filter cannot 
> differentiate multiple aliases for the same table.
> This generic capability will allow us to represent complex filters that 
> currently cannot be pushed down to the storage layer from the existing 
> engines allowing us to reap the benefits of LazyIO in many cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to