subject:"Repartition question"

Spark repartition question...

2017-04-30 Thread Muthu Jayakumar

Hello there, I am trying to understand the difference between the following reparition()... a. def repartition(partitionExprs: Column*): Dataset[T] b. def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T] c. def repartition(numPartitions: Int): Dataset[T] My understanding is

Re: Repartition question

2015-08-04 Thread Richard Marscher

Hi, it is possible to control the number of partitions for the RDD without calling repartition by setting the max split size for the hadoop input format used. Tracing through the code, XmlInputFormat extends FileInputFormat which determines the number of splits (which NewHadoopRdd uses to

Repartition question

2015-08-03 Thread Naveen Madhire

Hi All, I am running the WikiPedia parsing example present in the Advance Analytics with Spark book. https://github.com/sryza/aas/blob/d3f62ef3ed43a59140f4ae8afbe2ef81fc643ef2/ch06-lsa/src/main/scala/com/cloudera/datascience/lsa/ParseWikipedia.scala#l112 The partitions of the RDD returned by

Spark repartition question...

Re: Repartition question

Repartition question

3 matches

Site Navigation

Mail list logo

Footer information