[jira] Commented: (HIVE-1599) optimize mapjoin to use distributedcache

Jacob Rideout (JIRA) Wed, 22 Sep 2010 09:53:58 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913664#action_12913664
 ]


Jacob Rideout commented on HIVE-1599:
-------------------------------------

Additionally, if jvm reuse in enabled the mappers run within the same jvm can 
reuse an in memory (static?) copy of the data. When we implement map joins (in 
a non-hive java map-reduce job) and have jvm reuse enabled, we've seen 
significant performance improvements with many maps. 

> optimize mapjoin to use distributedcache
> ----------------------------------------
>
>                 Key: HIVE-1599
>                 URL: https://issues.apache.org/jira/browse/HIVE-1599
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>             Fix For: 0.7.0
>
>
> Currently, each mapper reads the file locally in case of a mapjoin. This 
> creates problems if the number
> of mappers is very high.
> It would be optimal to put the files in the distributedcache before the job 
> starts, and then the mappers
> can read it from the cache instead of reading from hdfs as they do currently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1599) optimize mapjoin to use distributedcache

Reply via email to