You can check org.apache.spark.sql.internal.SQLConf for other default settings as well. val SHUFFLE_PARTITIONS = SQLConfigBuilder("spark.sql.shuffle.partitions") .doc("The default number of partitions to use when shuffling data for joins or aggregations.") .intConf .createWithDefault(200)
> On 20 May 2016, at 13:17, εδΉι <251922...@qq.com> wrote: > > Hi all. > I set Spark.default.parallelism equals 20 in spark-default.conf. And send > this file to all nodes. > But I found reduce number is still default value,200. > Does anyone else encouter this problem? can anyone give some advice? > > ############ > [Stage 9:> (0 + 0) / > 200] > [Stage 9:> (0 + 2) / > 200] > [Stage 9:> (1 + 2) / > 200] > [Stage 9:> (2 + 2) / > 200] > ####### > > And this results in many empty files.Because my data is little, only some of > the 200 files have data. > ####### > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005 > ######## > > >