Re: OutOfMemory when doing map-side join

Min Zhou Sun, 14 Jun 2009 22:07:03 -0700

1. tried setting hive.mapjoin.cache.numrows to be 100,  failed with the same
exception.
2. Actually, we used to do the same thing by loading small tables into
memory of each map node in normal map-reduce with the same cluster, where
same heap size is guranteed between running hive map-side join and our
map-reduce job.  OOM exceptions never happened in that only 1MB would be
spent to load those 20k pieces of records while mapred.child.java.opts was
set to be -Xmx200m.


here is the schema of our small table:
> describe application;
transaction_id    string
subclass_id     string
class_id        string
memo string
url_alias    string
url_pattern     string
dt      string  (daily partitioned)

Thanks,
Min
On Mon, Jun 15, 2009 at 12:51 PM, Namit Jain <[email protected]> wrote:

>  1. Can you reduce the number of cached rows and try ?
>
> 2. Were you using default memory settings of the mapper ? If yes, can can
> increase it and try ?
>
> It would be useful to try both of them independently – it would give a good
> idea of memory consumption of JDBM also.
>
>
> Can you send the exact schema/data of the small table if possible ? You can
> file a jira and load it there if it not a security issue.
>
> Thanks,
> -namit
>
>
>
> On 6/14/09 9:23 PM, "Min Zhou" <[email protected]> wrote:
>
> 20k
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: OutOfMemory when doing map-side join

Reply via email to