[
https://issues.apache.org/jira/browse/HIVE-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913664#action_12913664
]
Jacob Rideout commented on HIVE-1599:
-------------------------------------
Additionally, if jvm reuse in enabled the mappers run within the same jvm can
reuse an in memory (static?) copy of the data. When we implement map joins (in
a non-hive java map-reduce job) and have jvm reuse enabled, we've seen
significant performance improvements with many maps.
> optimize mapjoin to use distributedcache
> ----------------------------------------
>
> Key: HIVE-1599
> URL: https://issues.apache.org/jira/browse/HIVE-1599
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Namit Jain
> Fix For: 0.7.0
>
>
> Currently, each mapper reads the file locally in case of a mapjoin. This
> creates problems if the number
> of mappers is very high.
> It would be optimal to put the files in the distributedcache before the job
> starts, and then the mappers
> can read it from the cache instead of reading from hdfs as they do currently.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.