[jira] [Resolved] (SPARK-13912) spark.hadoop.* configurations are not applied for Parquet Data Frame Readers

Yin Huai (JIRA) Mon, 02 May 2016 12:43:45 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yin Huai resolved SPARK-13912.
------------------------------
    Resolution: Duplicate
      Assignee: Reynold Xin

https://github.com/apache/spark/pull/12689 and 
https://github.com/apache/spark/pull/12688 together resolve this issue.

> spark.hadoop.* configurations are not applied for Parquet Data Frame Readers
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-13912
>                 URL: https://issues.apache.org/jira/browse/SPARK-13912
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Matt Cheah
>            Assignee: Reynold Xin
>
> I populated a SparkConf object passed to a SparkContext with some 
> spark.hadoop.* configurations, expecting them to be used in the backing 
> Hadoop file reading whenever I read from my DFS. However, when I was running 
> some jobs, I noticed that the configurations were not being properly applied 
> to the data frame reading when I used sqlContext.read().parquet().
> I looked in the codebase and noticed that SqlNewHadoopRDD doesn't use a 
> SparkConf nor SparkContext hadoop configuration to set up the Hadoop reading; 
> instead, it uses SparkHadoopUtil.get.conf. This Hadoop configuration object 
> won't have Hadoop configurations set on the Spark Context. In general it 
> seems like we have a discrepancy in how we set Hadoop configurations; when 
> reading raw RDDs via e.g. SparkContext.textFile() we take the Hadoop 
> configuration from the Spark Context, but for Data Frames we use 
> SparkHadoopUtil.conf.
> We should probably use the Spark Context hadoop configuration for Data Frames 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-13912) spark.hadoop.* configurations are not applied for Parquet Data Frame Readers

Reply via email to