[ 
https://issues.apache.org/jira/browse/SPARK-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Alexander Spitzer updated SPARK-12639:
----------------------------------------------
    Description: 
SPARK-11661 improves handling of predicate pushdowns but has an unintended 
consequence of making the explain string more confusing.

It basically makes it seem as if a source is always pushing down all of the 
filters (even those it cannot handle)
This can have a confusing effect (I kept checking my code to see where I had 
broken something  )
{code: title= "Query plan for source where nothing is handled by C* Source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
Although the tell tale "Filter" step is present my first instinct would tell me 
that the underlying source relation is using all of those filters.
{code: title = "Query plan for source where everything is handled by C* Source"}
Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
{code}
I think this would be much clearer if we changed the metadata key to 
"HandledFilters" and only listed those handled fully by the underlying source.

Something like
{code: title="Proposed Explain for Pushdown were none of the predicates are 
handled by the underlying source"}
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
 HandledFilters: []
{code}



  was:
SPARK-11661 improves handling of predicate pushdowns but has an unintended 
consequence of making the explain string more confusing.

It basically makes it seem as if a source is always pushing down all of the 
filters (even those it cannot handle)
This can have a confusing effect (I kept checking my code to see where I had 
broken something  )
"Query plan for source where nothing is handled by C* Source"
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
Although the tell tale "Filter" step is present my first instinct would tell me 
that the underlying source relation is using all of those filters.
"Query plan for source where everything is handled by C* Source"
Scan 
org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
 PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
I think this would be much clearer if we changed the metadata key to 
"HandledFilters" and only listed those handled fully by the underlying source.


> Improve Explain for DataSources with Handled Predicate Pushdowns
> ----------------------------------------------------------------
>
>                 Key: SPARK-12639
>                 URL: https://issues.apache.org/jira/browse/SPARK-12639
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Russell Alexander Spitzer
>            Priority: Minor
>
> SPARK-11661 improves handling of predicate pushdowns but has an unintended 
> consequence of making the explain string more confusing.
> It basically makes it seem as if a source is always pushing down all of the 
> filters (even those it cannot handle)
> This can have a confusing effect (I kept checking my code to see where I had 
> broken something  )
> {code: title= "Query plan for source where nothing is handled by C* Source"}
> Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
> +- Scan 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
>  PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
> {code}
> Although the tell tale "Filter" step is present my first instinct would tell 
> me that the underlying source relation is using all of those filters.
> {code: title = "Query plan for source where everything is handled by C* 
> Source"}
> Scan 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
>  PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
> {code}
> I think this would be much clearer if we changed the metadata key to 
> "HandledFilters" and only listed those handled fully by the underlying source.
> Something like
> {code: title="Proposed Explain for Pushdown were none of the predicates are 
> handled by the underlying source"}
> Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
> +- Scan 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
>  HandledFilters: []
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to