Re: number of partitions for hive schemaRDD

Cheng Lian Thu, 26 Feb 2015 02:14:55 -0800

Hi Masaki,

I guess what you saw is the partition number of the last stage, whichmust be 1 to perform the global phase of LIMIT. To tune partition numberof normal shuffles like joins, you may resort tospark.sql.shuffle.partitions.


Cheng

On 2/26/15 5:31 PM, masaki rikitoku wrote:

Hi all

now, I'm trying the SparkSQL with hivecontext.

when I execute the hql like the following.

---

val ctx = new org.apache.spark.sql.hive.HiveContext(sc)
import ctx._

val queries = ctx.hql("select keyword from queries where dt =
'2015-02-01' limit 10000000")

---

It seem that the number of the partitions ot the queries is set by 1.

Is this the specifications for schemaRDD, SparkSQL, HiveContext ?

Are there any means to set the number of partitions arbitrary value
except for explicit repartition


Masaki Rikitoku

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: number of partitions for hive schemaRDD

Reply via email to