[jira] [Resolved] (SPARK-16689) FileSourceStrategy: Pruning Partition Columns When No Partition Column Exist in Project

Sean Owen (JIRA) Mon, 07 Nov 2016 12:29:10 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-16689.
-------------------------------
    Resolution: Won't Fix

> FileSourceStrategy: Pruning Partition Columns When No Partition Column Exist 
> in Project
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-16689
>                 URL: https://issues.apache.org/jira/browse/SPARK-16689
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Xiao Li
>
> For partitioned file sources, the current implementation always scans all the 
> partition columns. However, this is not necessary when the projected column 
> list does not include any partition column. In addition, we also can avoid 
> the unnecessary Project.
> Below is an example,
> {noformat}
> spark
>   .range(N)
>   .selectExpr("id AS value1", "id AS value2", "id AS p1", "id AS p2", "id AS 
> p3")
>   .toDF("value", "value2", "p1", "p2", "p3").write.format("json")
>   .partitionBy("p1", "p2", "p3").save(tempDir)
> spark.read.format("json").load(tempDir).selectExpr("value")
> {noformat}
> Before the PR changes, the physical plan is like:
> {noformat}
> == Physical Plan ==
> *Project [value#37L]
> +- *Scan json [value#37L,p1#39,p2#40,p3#41] Format: JSON, InputPaths: 
> file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/spark-f7a4294a-2e1b-4f44-9ebb-1a5eb...,
>  PushedFilters: [], ReadSchema: struct<value:bigint>
> {noformat}
> After the PR changes, the physical plan becomes:
> {noformat}
> == Physical Plan ==
> *Scan json [value#147L] Format: JSON, InputPaths: 
> file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzw0000gn/T/spark-a5bcb14a-46c2-4c20-8f34-9662b...,
>  PushedFilters: [], ReadSchema: struct<value:bigint>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-16689) FileSourceStrategy: Pruning Partition Columns When No Partition Column Exist in Project

Reply via email to