Russell Alexander Spitzer created SPARK-12639:
-------------------------------------------------
Summary: Improve Explain for DataSources with Handled Predicate
Pushdowns
Key: SPARK-12639
URL: https://issues.apache.org/jira/browse/SPARK-12639
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.6.0
Reporter: Russell Alexander Spitzer
Priority: Minor
SPARK-11661 improves handling of predicate pushdowns but has an unintended
consequence of making the explain string more confusing.
It basically makes it seem as if a source is always pushing down all of the
filters (even those it cannot handle)
This can have a confusing effect (I kept checking my code to see where I had
broken something )
"Query plan for source where nothing is handled by C* Source"
Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 = 1))
+- Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
Although the tell tale "Filter" step is present my first instinct would tell me
that the underlying source relation is using all of those filters.
"Query plan for source where everything is handled by C* Source"
Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
I think this would be much clearer if we changed the metadata key to
"HandledFilters" and only listed those handled fully by the underlying source.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]