Out of memory when Pig LEFT OUTER JOIN using replicated with a large input file

李响 Tue, 15 Sep 2015 09:24:52 -0700

Hi all,

I used the following in the project


JOIN a1 BY xxx LEFT OUTER, a2 BY xxxx USING 'replicated'


after loading a large file into a2, I hit out-of-memory.

The Pig Latin doc says that the replidated join is to put the right-hand
side table into the memory for each mapper, allowing the join computed
without reducers.

1. May I resolve this by increaing the heap size ? is it
mapred.child.java.opts ?

2. As the input file is becoming larger and larger, I think the increasing
of heap mem is not a long term solution. Can I use other operators to
refactor that LEFT OUTER JOIN ?  Does anyone has experience on it?

3. Or any other suggestions ?

Thanks @_@

-- 

                                               李响

E-mail             ：[email protected]

Out of memory when Pig LEFT OUTER JOIN using replicated with a large input file

Reply via email to