optimize mapjoin to use distributedcache
----------------------------------------

                 Key: HIVE-1599
                 URL: https://issues.apache.org/jira/browse/HIVE-1599
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
             Fix For: 0.7.0


Currently, each mapper reads the file locally in case of a mapjoin. This 
creates problems if the number
of mappers is very high.

It would be optimal to put the files in the distributedcache before the job 
starts, and then the mappers
can read it from the cache instead of reading from hdfs as they do currently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to