[spark] branch branch-3.0 updated: [MINOR][DOC] Add class document for PruneFileSourcePartitions and PruneHiveTablePartitions

wenchen Tue, 11 Feb 2020 06:44:13 -0800

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new e139bf3  [MINOR][DOC] Add class document for PruneFileSourcePartitions 
and PruneHiveTablePartitions
e139bf3 is described below

commit e139bf3c1b541b235b9a8f2750a5245ccc902f8a
Author: fuwhu <[email protected]>
AuthorDate: Tue Feb 11 22:16:44 2020 +0800

    [MINOR][DOC] Add class document for PruneFileSourcePartitions and 
PruneHiveTablePartitions
    
    ### What changes were proposed in this pull request?
    Add class document for PruneFileSourcePartitions and 
PruneHiveTablePartitions.
    
    ### Why are the changes needed?
    To describe these two classes.
    
    ### Does this PR introduce any user-facing change?
    no
    
    ### How was this patch tested?
    no
    
    Closes #27535 from fuwhu/SPARK-15616-FOLLOW-UP.
    
    Authored-by: fuwhu <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
    (cherry picked from commit f1d0dce4848a53831268c80bf7e1e0f47a1f7612)
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../execution/datasources/PruneFileSourcePartitions.scala   | 13 +++++++++++++
 .../spark/sql/hive/execution/PruneHiveTablePartitions.scala |  8 ++++++++
 2 files changed, 21 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
index 1ea19c1..a7129fb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
@@ -26,6 +26,19 @@ import org.apache.spark.sql.catalyst.rules.Rule
 import 
org.apache.spark.sql.execution.datasources.v2.{DataSourceV2ScanRelation, 
FileScan}
 import org.apache.spark.sql.types.StructType
 
+/**
+ * Prune the partitions of file source based table using partition filters. 
Currently, this rule
+ * is applied to [[HadoopFsRelation]] with [[CatalogFileIndex]] and 
[[DataSourceV2ScanRelation]]
+ * with [[FileScan]].
+ *
+ * For [[HadoopFsRelation]], the location will be replaced by pruned file 
index, and corresponding
+ * statistics will be updated. And the partition filters will be kept in the 
filters of returned
+ * logical plan.
+ *
+ * For [[DataSourceV2ScanRelation]], both partition filters and data filters 
will be added to
+ * its underlying [[FileScan]]. And the partition filters will be removed in 
the filters of
+ * returned logical plan.
+ */
 private[sql] object PruneFileSourcePartitions extends Rule[LogicalPlan] {
 
   private def getPartitionKeyFiltersAndDataFilters(
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
index a0349f6..da6e4c5 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
@@ -30,6 +30,14 @@ import 
org.apache.spark.sql.execution.datasources.DataSourceStrategy
 import org.apache.spark.sql.internal.SQLConf
 
 /**
+ * Prune hive table partitions using partition filters on 
[[HiveTableRelation]]. The pruned
+ * partitions will be kept in [[HiveTableRelation.prunedPartitions]], and the 
statistics of
+ * the hive table relation will be updated based on pruned partitions.
+ *
+ * This rule is executed in optimization phase, so the statistics can be 
updated before physical
+ * planning, which is useful for some spark strategy, eg.
+ * [[org.apache.spark.sql.execution.SparkStrategies.JoinSelection]].
+ *
  * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
  */
 private[sql] class PruneHiveTablePartitions(session: SparkSession)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [MINOR][DOC] Add class document for PruneFileSourcePartitions and PruneHiveTablePartitions

Reply via email to