Cheng Hao created SPARK-11512:
---------------------------------
Summary: Bucket Join
Key: SPARK-11512
URL: https://issues.apache.org/jira/browse/SPARK-11512
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Cheng Hao
Sort merge join on two datasets on the file system that have already been
partitioned the same with the same number of partitions and sorted within each
partition, and we don't need to sort it again while join with the
sorted/partitioned keys
This functionality exists in
- Hive (hive.optimize.bucketmapjoin.sortedmerge)
- Pig (USING 'merge')
- MapReduce (CompositeInputFormat)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]