[jira] [Commented] (SPARK-10978) Allow PrunedFilterScan to eliminate predicates from further evaluation

Hyukjin Kwon (JIRA) Mon, 09 Nov 2015 23:30:23 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998101#comment-14998101
 ]


Hyukjin Kwon commented on SPARK-10978:
--------------------------------------

I think we should add another interface such as {{def 
partiallyHandledFilters}}. For example in the case of ORC file, it does not 
filter record by record but rough results. In this case, Spark-side filter 
should be applied.

I manually add some codes at {{def unhandledFilters}} for Parquet and ORC 
datasources and I could reproduce the wrong results for ORC files. 

I have been working on this since I though ORC filters are not pushed down but 
it looks any filters are not pushed down and I guess he is working on this.

Could I try to add this if it is an issue?

I unintentionally opened this issue here as I though this is an issue.
https://issues.apache.org/jira/browse/SPARK-10978

> Allow PrunedFilterScan to eliminate predicates from further evaluation
> ----------------------------------------------------------------------
>
>                 Key: SPARK-10978
>                 URL: https://issues.apache.org/jira/browse/SPARK-10978
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 1.3.0, 1.4.0, 1.5.0
>            Reporter: Russell Alexander Spitzer
>            Assignee: Cheng Lian
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> Currently PrunedFilterScan allows implementors to push down predicates to an 
> underlying datasource. This is done solely as an optimization as the 
> predicate will be reapplied on the Spark side as well. This allows for 
> bloom-filter like operations but ends up doing a redundant scan for those 
> sources which can do accurate pushdowns.
> In addition it makes it difficult for underlying sources to accept queries 
> which reference non-existent to provide ancillary function. In our case we 
> allow a solr query to be passed in via a non-existent solr_query column. 
> Since this column is not returned when Spark does a filter on "solr_query" 
> nothing passes. 
> Suggestion on the ML from [~marmbrus] 
> {quote}
> We have to try and maintain binary compatibility here, so probably the 
> easiest thing to do here would be to add a method to the class.  Perhaps 
> something like:
> def unhandledFilters(filters: Array[Filter]): Array[Filter] = filters
> By default, this could return all filters so behavior would remain the same, 
> but specific implementations could override it.  There is still a chance that 
> this would conflict with existing methods, but hopefully that would not be a 
> problem in practice.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10978) Allow PrunedFilterScan to eliminate predicates from further evaluation

Reply via email to