[
https://issues.apache.org/jira/browse/SPARK-30328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-30328.
----------------------------------
Resolution: Invalid
> Fail to write local files with RDD.saveTextFile when setting the incorrect
> Hadoop configuration files
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-30328
> URL: https://issues.apache.org/jira/browse/SPARK-30328
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.0
> Reporter: chendihao
> Priority: Major
>
> We find that the incorrect Hadoop configuration files cause the failure of
> saving RDD to local file system. It is not expected because we have specify
> the local url and the API of DataFrame.write.text does not have this issue.
> It is easy to reproduce and verify with Spark 2.3.0.
> 1.Do not set environment variable of `HADOOP_CONF_DIR`.
> 2.Install pyspark and run the local Python script. This should work and save
> files to local file system.
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("local").getOrCreate()
> sc = spark.sparkContextrdd = sc.parallelize([1, 2, 3])
> rdd.saveAsTextFile("file:///tmp/rdd.text")
> {code}
> 3.Set environment variable of `HADOOP_CONF_DIR` and put the Hadoop
> configuration files there. Make sure the format of `core-site.xml` is right
> but it has an unresolved host name.
> 4.Run the same Python script again. If it try to connect HDFS and found the
> unresolved host name, Java exception happens.
> We thinks `saveAsTextFile("file:///)` should not attempt to connect HDFS
> whenever `HADOOP_CONF_DIR` is set or not. Actually the following code of
> DataFrame will work with the same incorrect Hadoop configuration files.
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("local").getOrCreate()
> df = spark.createDataFrame(rows, ["attribute", "value"])
> df.write.parquet("file:///tmp/df.parquet")
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]