Hi,

Thank you for the report!

Can you open a JIRA for this issue? It sounds like a bug.

Brock


On Fri, Nov 1, 2013 at 2:23 AM, Mehant Baid <baid.meh...@gmail.com> wrote:

> Hey Folks,
>
> Could you please take a look at the below problem. We are hitting
> OutOfMemoryErrors while joining tables that are not managed by Hive.
>
> Would appreciate any feedback.
>
> Thanks
> Mehant
>
> On 10/7/13 12:04 PM, Mehant Baid wrote:
>
>> Hey Folks,
>>
>> We are using hive-0.11 and are hitting java.lang.OutOfMemoryError. The
>> problem seems to be in CommonJoinResolver.java (processCurrentTask()), in
>> this function we try and convert a map-reduce join to a map join if 'n-1'
>> tables involved in a 'n' way join have a size below a certain threshold.
>>
>> If the tables are maintained by hive then we have accurate sizes of each
>> table and can apply this optimization but if the tables are created using
>> storage handlers, HBaseStorageHanlder in our case then the size is set to
>> be zero. Due to this we assume that we can apply the optimization and
>> convert the map-reduce join to a map join. So we build a in-memory hash
>> table for all the keys, since our table created using the storage handler
>> is large, it does not fit in memory and we hit the error.
>>
>> Should I open a JIRA for this? One way to fix this is to set the size of
>> the table (created using storage handler) to be equal to the map join
>> threshold. This way the table would be selected as the big table and we can
>> proceed with the optimization if other tables in the join have size below
>> the threshold. If we have multiple big tables then the optimization would
>> be turned off.
>>
>> Thanks
>> Mehant
>>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Reply via email to