Hi Masaki,

I guess what you saw is the partition number of the last stage, which must be 1 to perform the global phase of LIMIT. To tune partition number of normal shuffles like joins, you may resort to spark.sql.shuffle.partitions.

Cheng

On 2/26/15 5:31 PM, masaki rikitoku wrote:
Hi all

now, I'm trying the SparkSQL with hivecontext.

when I execute the hql like the following.

---

val ctx = new org.apache.spark.sql.hive.HiveContext(sc)
import ctx._

val queries = ctx.hql("select keyword from queries where dt =
'2015-02-01' limit 10000000")

---

It seem that the number of the partitions ot the queries is set by 1.

Is this the specifications for schemaRDD, SparkSQL, HiveContext ?

Are there any means to set the number of partitions arbitrary value
except for explicit repartition


Masaki Rikitoku

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to