I was looking at the code - and there may be a bug in cartesian product 
codepath for map-join.

Can you do a explain plan and send it ?





On 6/14/09 10:06 PM, "Min Zhou" <[email protected]> wrote:


1. tried setting hive.mapjoin.cache.numrows to be 100,  failed with the same 
exception.
2. Actually, we used to do the same thing by loading small tables into memory 
of each map node in normal map-reduce with the same cluster, where same heap 
size is guranteed between running hive map-side join and our map-reduce job.  
OOM exceptions never happened in that only 1MB would be spent to load those 20k 
pieces of records while mapred.child.java.opts was set to be -Xmx200m.

here is the schema of our small table:
> describe application;
transaction_id    string
subclass_id     string
class_id        string
memo string
url_alias    string
url_pattern     string
dt      string  (daily partitioned)

Thanks,
Min
On Mon, Jun 15, 2009 at 12:51 PM, Namit Jain <[email protected]> wrote:
1. Can you reduce the number of cached rows and try ?

2. Were you using default memory settings of the mapper ? If yes, can can 
increase it and try ?

It would be useful to try both of them independently - it would give a good 
idea of memory consumption of JDBM also.


Can you send the exact schema/data of the small table if possible ? You can 
file a jira and load it there if it not a security issue.

Thanks,
-namit



On 6/14/09 9:23 PM, "Min Zhou" <[email protected]> wrote:

20k


Reply via email to