Micah Kornfield created SPARK-32095:
---------------------------------------

             Summary: [DataSource V2] Documentation on SupportsReportStatistics 
Outdated?
                 Key: SPARK-32095
                 URL: https://issues.apache.org/jira/browse/SPARK-32095
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0, 2.4.6
            Reporter: Micah Kornfield


I was wondering if the documentation on SupportsReportStatistics [1][3] about 
its interaction with the planner and predicate pushdowns is still accurate. It 
says:

"Implementations that return more accurate statistics based on pushed operators 
will not improve query performance until the planner can push operators before 
getting stats."

 

Is this still accurate? When looking through the code it seems like there is 
now functionality that explicitly wants the operators pushed down [2]. Is the 
documentation for SupportsReportStatistics referring to something other than 
[2] or should it be updated?

 

[[1]https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/SupportsReportStatistics.html|https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/SupportsReportStatistics.html]

[2] 
[https://github.com/apache/spark/blob/d0800fc8e2e71a79bf0f72c3e4bc608ae34053e7/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala#L86]

[3][https://spark.apache.org/docs/3.0.0-preview/api/java/org/apache/spark/sql/connector/read/SupportsReportStatistics.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to