It's probably caused by the Cartesian product of many rows from the two tables with the same key.
Zheng On Sun, Oct 25, 2009 at 7:22 PM, Ryan LeCompte <[email protected]> wrote: > It also looks like the reducers just never stop outputting things likethe > (following -- see below), causing them to ultimately time out and get > killed by the system. > > 2009-10-25 22:21:18,879 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 100000000 rows > > 2009-10-25 22:21:22,009 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 101000000 rows > 2009-10-25 22:21:22,010 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 101000000 rows > 2009-10-25 22:21:25,141 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 102000000 rows > > 2009-10-25 22:21:25,142 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 102000000 rows > 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 103000000 rows > 2009-10-25 22:21:28,263 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 103000000 rows > > 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 104000000 rows > 2009-10-25 22:21:31,387 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 104000000 rows > 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 105000000 rows > > 2009-10-25 22:21:34,510 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 105000000 rows > 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 > forwarding 106000000 rows > 2009-10-25 22:21:37,633 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 > forwarding 106000000 rows > > > > On Sun, Oct 25, 2009 at 9:39 PM, Ryan LeCompte <[email protected]> wrote: > >> Hello all, >> >> Should I expect to be able to do a Hive JOIN between two tables that have >> about 10 or 15GB of data each? What I'm noticing (for a simple JOIN) is that >> all the map tasks complete, but the reducers just hang at around 87% or so >> (for the first set of 4 reducers), and then they eventually just get killed >> due to inability to respond by the cluster. I can do a JOIN between a large >> table and a very small table of 10 or so records just fine. >> >> Any thoughts? >> >> Thanks, >> Ryan >> >> > -- Yours, Zheng
