Spark saveAsText file size

2014-11-24 Thread Alan Prando
Hi Folks! I'm running a spark JOB on a cluster with 9 slaves and 1 master (250GB RAM, 32 cores each and 1TB of storage each). This job generates 1.200 TB of data on a RDD with 1200 partitions. When I call saveAsTextFile(hdfs://...), spark creates 1200 files named part-000* on HDFS's folder.

Re: Spark saveAsText file size

2014-11-24 Thread Yanbo Liang
In memory cache may be blow up the size of RDD. It's general condition that RDD will take more space in memory than disk. There are options to configure and optimize storage space efficiency in Spark, take a look at this https://spark.apache.org/docs/latest/tuning.html 2014-11-25 10:38 GMT+08:00