[ 
https://issues.apache.org/jira/browse/SPARK-10395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-10395.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 8553
[https://github.com/apache/spark/pull/8553]

> Simplify CatalystReadSupport
> ----------------------------
>
>                 Key: SPARK-10395
>                 URL: https://issues.apache.org/jira/browse/SPARK-10395
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> The API interface of Parquet {{ReadSupport}} is a little bit over complicated 
> because of historical reasons.  In older versions of parquet-mr (say 1.6.0rc3 
> and prior), {{ReadSupport}} need to be instantiated and initialized twice on 
> both driver side and executor side.  The {{init()}} method is for driver side 
> initialization, while {{prepareForRead()}} is for executor side.  However, 
> starting from parquet-mr 1.6.0, it's no longer the case, and {{ReadSupport}} 
> is only instantiated and initialized on executor side.  So, theoretically, 
> now it's totally fine to combine these two methods into a single 
> initialization method.  The only reason (I could think of) to still have them 
> here is for parquet-mr API backwards-compatibility.
> Due to this reason, we no longer need to rely on {{ReadContext}} to pass 
> requested schema from {{init()}} to {{prepareForRead()}}, using a private 
> `var` for requested schema in {{CatalystReadSupport}} would be enough.
> Another thing is that, after removing the old Parquet support code, now we 
> always set Catalyst requested schema properly when reading Parquet files.  So 
> all those "fallback" logic in {{CatalystReadSupport}} is now redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to