Re: Issues with joining across large tables

Zheng Shao Sun, 25 Oct 2009 23:08:39 -0700

It's probably caused by the Cartesian product of many rows from the two
tables with the same key.


Zheng

On Sun, Oct 25, 2009 at 7:22 PM, Ryan LeCompte <[email protected]> wrote:

> It also looks like the reducers just never stop outputting things likethe
> (following  -- see below), causing them to ultimately time out and get
> killed by the system.
>
> 2009-10-25 22:21:18,879 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 100000000 rows
>
> 2009-10-25 22:21:22,009 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 101000000 rows
> 2009-10-25 22:21:22,010 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 101000000 rows
> 2009-10-25 22:21:25,141 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 102000000 rows
>
> 2009-10-25 22:21:25,142 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 102000000 rows
> 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 103000000 rows
> 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 103000000 rows
>
> 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 104000000 rows
> 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 104000000 rows
> 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 105000000 rows
>
> 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 105000000 rows
> 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 
> forwarding 106000000 rows
> 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 
> forwarding 106000000 rows
>
>
>
> On Sun, Oct 25, 2009 at 9:39 PM, Ryan LeCompte <[email protected]> wrote:
>
>> Hello all,
>>
>> Should I expect to be able to do a Hive JOIN between two tables that have
>> about 10 or 15GB of data each? What I'm noticing (for a simple JOIN) is that
>> all the map tasks complete, but the reducers just hang at around 87% or so
>> (for the first set of 4 reducers), and then they eventually just get killed
>> due to inability to respond by the cluster. I can do a JOIN between a large
>> table and a very small table of 10 or so records just fine.
>>
>> Any thoughts?
>>
>> Thanks,
>> Ryan
>>
>>
>


-- 
Yours,
Zheng

Re: Issues with joining across large tables

Reply via email to