part-xxx instead of directly saving in outputDir

omkar puttagunta (JIRA) Thu, 30 Aug 2018 21:45:43 -0700

omkar puttagunta created SPARK-25292:
----------------------------------------


             Summary: Dataframe write to csv saves part files in 
outputDireotry/task-xx/part-xxx instead of directly saving in outputDir
                 Key: SPARK-25292
                 URL: https://issues.apache.org/jira/browse/SPARK-25292
             Project: Spark
          Issue Type: Bug
          Components: EC2, Java API, Spark Shell, Spark Submit
    Affects Versions: 2.0.2
            Reporter: omkar puttagunta


[https://stackoverflow.com/questions/52108335/why-spark-dataframe-writes-part-files-to-temporary-in-instead-directly-creating]
{quote}Running Spark 2.0.2 in Standalone Cluster Mode; 2 workers and 1 master 
node
{quote}
Simple Test; reading pipe delimited file and writing data to csv. Commands 
below are executed in spark-shell with master-url set

{{val df = 
spark.sqlContext.read.option("delimiter","|").option("quote","\u0000").csv("/home/input-files/")
 val emailDf=df.filter("_c3='EML'") 
emailDf.repartition(100).write.csv("/opt/outputFile/")}}

After executing the cmds above in spark-shell with master url set.
{quote}In {{worker1}} -> Each part file is created 
in{{/opt/outputFile/_temporary/task-xxxxx-xxx/part-xxx-xxx}}
In {{worker2}} -> {{/opt/outputFile/part-xxx}} => part files are generated 
directly under outputDirectory specified during write.
{quote}
*Same thing happens with coalesce(100) or without specifying 
repartition/coalesce!!!*

*_Quesiton_*

1) why {{worker1}} {{/opt/outputFile/}} output directory doesn't have 
{{part-xxxx}} files just like in {{worker2}}? why {{_temporary}} directory is 
created and {{part-xxx-xx}} files reside in the {{task-xxx}}directories?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-25292) Dataframe write to csv saves part files in outputDireotry/task-xx/part-xxx instead of directly saving in outputDir

Reply via email to