(for example, you might be able to avoid a shuffle when doing joins on tables that are already bucketed by exposing more metastore information to the planner).
Can you provide more input on how to implement this functionality so that i can speed up join between 2 hive tables, both with few billion rows -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Support-for-Hive-buckets-tp8421p9905.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org