[ https://issues.apache.org/jira/browse/HIVE-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913664#action_12913664 ]
Jacob Rideout commented on HIVE-1599: ------------------------------------- Additionally, if jvm reuse in enabled the mappers run within the same jvm can reuse an in memory (static?) copy of the data. When we implement map joins (in a non-hive java map-reduce job) and have jvm reuse enabled, we've seen significant performance improvements with many maps. > optimize mapjoin to use distributedcache > ---------------------------------------- > > Key: HIVE-1599 > URL: https://issues.apache.org/jira/browse/HIVE-1599 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Namit Jain > Fix For: 0.7.0 > > > Currently, each mapper reads the file locally in case of a mapjoin. This > creates problems if the number > of mappers is very high. > It would be optimal to put the files in the distributedcache before the job > starts, and then the mappers > can read it from the cache instead of reading from hdfs as they do currently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.