I was looking at the code - and there may be a bug in cartesian product codepath for map-join.
Can you do a explain plan and send it ? On 6/14/09 10:06 PM, "Min Zhou" <[email protected]> wrote: 1. tried setting hive.mapjoin.cache.numrows to be 100, failed with the same exception. 2. Actually, we used to do the same thing by loading small tables into memory of each map node in normal map-reduce with the same cluster, where same heap size is guranteed between running hive map-side join and our map-reduce job. OOM exceptions never happened in that only 1MB would be spent to load those 20k pieces of records while mapred.child.java.opts was set to be -Xmx200m. here is the schema of our small table: > describe application; transaction_id string subclass_id string class_id string memo string url_alias string url_pattern string dt string (daily partitioned) Thanks, Min On Mon, Jun 15, 2009 at 12:51 PM, Namit Jain <[email protected]> wrote: 1. Can you reduce the number of cached rows and try ? 2. Were you using default memory settings of the mapper ? If yes, can can increase it and try ? It would be useful to try both of them independently - it would give a good idea of memory consumption of JDBM also. Can you send the exact schema/data of the small table if possible ? You can file a jira and load it there if it not a security issue. Thanks, -namit On 6/14/09 9:23 PM, "Min Zhou" <[email protected]> wrote: 20k
