Hi Ted,
Any chance to develop more on the SQLConf parameters in the sense to have more
explanations for changing these settings?
Not all of them are made clear in the descriptions.
Thanks!
Best,
Ovidiu
> On 31 May 2016, at 16:30, Ted Yu wrote:
>
> Maciej:
> You can refer
If you don't hesitate the newest version, you try to use v2.0-preview.
http://spark.apache.org/news/spark-2.0.0-preview.html
There, you can control #partitions for input partitions without shuffles by
two parameters below;
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
( Not
Maciej:
You can refer to the doc in
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
for these parameters.
On Tue, May 31, 2016 at 7:27 AM, Takeshi Yamamuro
wrote:
> If you don't hesitate the newest version, you try to use v2.0-preview.
>
Thanks.
At what conditions number of partitions can be higher than minPartitions
when reading textFile? Should this be considered as unfrequent situation?
To sum up - is there more efficient way to ensure exact number of
partitions than following:
rdd = sc.textFile("perf_test1.csv",
After setting shuffle to true I get expected 128 partitions, but I'm
worried about performance of such solution - especially I see that some
shuffling is done because size of partitions chages:
scala> sc.textFile("hdfs:///proj/dFAB_test/testdata/perf_test1.csv",
minPartitions=128).coalesce(128,
Value for shuffle is false by default.
Have you tried setting it to true ?
Which Spark release are you using ?
On Tue, May 31, 2016 at 6:13 AM, Maciej Sokołowski
wrote:
> Hello Spark users and developers.
>
> I read file and want to ensure that it has exact number of
Hello Spark users and developers.
I read file and want to ensure that it has exact number of partitions, for
example 128.
In documentation I found:
def textFile(path: String, minPartitions: Int = defaultMinPartitions):
RDD[String]
But argument here is minimal number of partitions, so I use
Hello Spark users and developers.
I read file and want to ensure that it has exact number of partitions, for
example 128.
In documentation I found:
def textFile(path: String, minPartitions: Int = defaultMinPartitions):
RDD[String]
But argument here is minimal number of partitions, so I use