Is this what are you looking for ?

In Shark, default reducer number is 1 and is controlled by the property 
mapred.reduce.tasks. Spark SQL deprecates this property in favor 
ofspark.sql.shuffle.partitions, whose default value is 200. Users may customize 
this property via SET:
SET spark.sql.shuffle.partitions=10; SELECT page, count(*) c
FROM logs_last_month_cached
GROUP BY page ORDER BY c DESC LIMIT 10;

Spark SQL Programming Guide - Spark 1.1.0 Documentation

  
          
Spark SQL Programming Guide - Spark 1.1.0 Documentation
Spark SQL Programming Guide Overview Getting Started Data Sources RDDs 
Inferring the Schema Using Reflection Programmatically Specifying the Schema 
Parquet Files Loading Data Programmatically   
View on spark.apache.org Preview by Yahoo  
  


________________________________
 From: shahab <shahab.mok...@gmail.com>
To: user@spark.apache.org 
Sent: Tuesday, October 28, 2014 3:20 PM
Subject: How can number of partitions be set in "spark-env.sh"?
 


I am running a stand alone Spark cluster, 2 workers each has 2 cores.
Apparently, I am loading and processing relatively large chunk of data so that 
I receive task failure " " .  As I read from some posts and discussions in the 
mailing list the failures could be related to the large size of processing data 
in the partitions and if I have understood correctly I should have smaller 
partitions (but many of them) ?!

Is there any way that I can set the number of partitions dynamically in 
"spark-env.sh" or in the submiited Spark application?


best,
/Shahab

Reply via email to