1. tried setting hive.mapjoin.cache.numrows to be 100, failed with the same exception. 2. Actually, we used to do the same thing by loading small tables into memory of each map node in normal map-reduce with the same cluster, where same heap size is guranteed between running hive map-side join and our map-reduce job. OOM exceptions never happened in that only 1MB would be spent to load those 20k pieces of records while mapred.child.java.opts was set to be -Xmx200m.
here is the schema of our small table: > describe application; transaction_id string subclass_id string class_id string memo string url_alias string url_pattern string dt string (daily partitioned) Thanks, Min On Mon, Jun 15, 2009 at 12:51 PM, Namit Jain <[email protected]> wrote: > 1. Can you reduce the number of cached rows and try ? > > 2. Were you using default memory settings of the mapper ? If yes, can can > increase it and try ? > > It would be useful to try both of them independently – it would give a good > idea of memory consumption of JDBM also. > > > Can you send the exact schema/data of the small table if possible ? You can > file a jira and load it there if it not a security issue. > > Thanks, > -namit > > > > On 6/14/09 9:23 PM, "Min Zhou" <[email protected]> wrote: > > 20k > > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
