optimize mapjoin to use distributedcache
----------------------------------------
Key: HIVE-1599
URL: https://issues.apache.org/jira/browse/HIVE-1599
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
Fix For: 0.7.0
Currently, each mapper reads the file locally in case of a mapjoin. This
creates problems if the number
of mappers is very high.
It would be optimal to put the files in the distributedcache before the job
starts, and then the mappers
can read it from the cache instead of reading from hdfs as they do currently.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.