task_xxx folder on worker nodes

Femi Anthony Thu, 10 Aug 2017 01:24:54 -0700

Also, why are you trying to write results locally if you're not using a 
distributed file system ? Spark is geared towards writing to a distributed file 
system. I would suggest trying to collect() so the data is sent to the master 
and then do a write if the result set isn't too big, or repartition before 
trying to write (though I suspect this won't really help). You really should 
install HDFS if that is possible.


Sent from my iPhone

> On Aug 10, 2017, at 3:58 AM, Hemanth Gudela <hemanth.gud...@qvantel.com> 
> wrote:
> 
> Thanks for reply Femi!
>  
> I’m writing the file like this à 
> myDataFrame.write.mode("overwrite").csv("myFilePath")
> There absolutely are no errors/warnings after the write.
>  
> _SUCCESS file is created on master node, but the problem of _temporary is 
> noticed only on worked nodes.
>  
> I know spark.write.csv works best with HDFS, but with the current setup I 
> have in my environment, I have to deal with spark write to node’s local file 
> system and not to HDFS.
>  
> Regards,
> Hemanth
>  
> From: Femi Anthony <femib...@gmail.com>
> Date: Thursday, 10 August 2017 at 10.38
> To: Hemanth Gudela <hemanth.gud...@qvantel.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: spark.write.csv is not able write files to specified path, but 
> is writing to unintended subfolder _temporary/0/task_xxx folder on worker 
> nodes
>  
> Normally the _temporary directory gets deleted as part of the cleanup when 
> the write is complete and a SUCCESS file is created. I suspect that the 
> writes are not properly completed. How are you specifying the write ? Any 
> error messages in the logs ?
>  
> On Thu, Aug 10, 2017 at 3:17 AM, Hemanth Gudela <hemanth.gud...@qvantel.com> 
> wrote:
> Hi,
>  
> I’m running spark on cluster mode containing 4 nodes, and trying to write CSV 
> files to node’s local path (not HDFS).
> I’m spark.write.csv to write CSV files.
>  
> On master node:
> spark.write.csv creates a folder with csv file name and writes many files 
> with part-r-000n suffix. This is okay for me, I can merge them later.
> But on worker nodes:
>                 spark.write.csv creates a folder with csv file name and 
> writes many folders and files under _temporary/0/. This is not okay for me.
> Could someone please suggest me what could have been going wrong in my 
> settings/how to be able to write csv files to the specified folder, and not 
> to subfolders (_temporary/0/task_xxx) in worker machines.
>  
> Thank you,
> Hemanth
>  
> 
> 
>  
> --
> http://www.femibyte.com/twiki5/bin/view/Tech/
> http://www.nextmatrix.com
> "Great spirits have always encountered violent opposition from mediocre 
> minds." - Albert Einstein.

Re: spark.write.csv is not able write files to specified path, but is writing to unintended subfolder _temporary/0/task_xxx folder on worker nodes

Reply via email to