It may be related to https://issues.apache.org/jira/browse/HIVE-704.
Please make sure the largest table in the join is the rightmost table.
Thanks,
Ning
On Oct 20, 2009, at 10:02 AM, Chris Bates wrote:
Hi all,
I'm trying to run this query on two 8gb datasets:
SELECT COUNT(UT.UserID) FROM streamtransfers ST JOIN usertracking UT ON
(ST.usertrackingid = UT.usertrackingid) WHERE UT.UserID IS NOT NULL AND
UT.UserID <> 0 GROUP BY UT.UserID;
I've also tried its DISTINCT counterpart.
Hive-0.4.0 on Hadoop 0.20.1 gives me this error using both queries:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver
The logs give me these errors. After seeing this I increased the maximum ram
on all the machines (6 nodes) to 2gb and still ran into the same error. Any
ideas?
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedWriter.<init>(BufferedWriter.java:87)
at java.io.BufferedWriter.<init>(BufferedWriter.java:70)
at java.io.PrintStream.init(PrintStream.java:83)
at java.io.PrintStream.<init>(PrintStream.java:100)
at java.io.PrintStream.<init>(PrintStream.java:62)
at org.apache.hadoop.mapred.Child.main(Child.java:198)
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$Packet.<init>(DFSClient.java:2107)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2939)
at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
at
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3148)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3113)
at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1002)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:208)
at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1419)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:212)
at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:197)
syslog logs
: Down to the last merge-pass, with 140 segments left of total size: 23370303
bytes
2009-10-20 12:33:59,418 INFO org.apache.hadoop.mapred.ReduceTask: Merged 140
segments, 23370303 bytes to disk to satisfy reduce memory limit
2009-10-20 12:33:59,419 INFO org.apache.hadoop.mapred.ReduceTask: Merging 3
files, 215596911 bytes from disk
2009-10-20 12:33:59,421 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
segments, 0 bytes from memory into reduce
2009-10-20 12:33:59,421 INFO org.apache.hadoop.mapred.Merger: Merging 3 sorted
segments
2009-10-20 12:33:59,440 INFO org.apache.hadoop.mapred.Merger: Down to the last
merge-pass, with 3 segments left of total size: 215596899 bytes
2009-10-20 12:33:59,450 INFO ExecReducer: maximum memory = 208142336
2009-10-20 12:33:59,451 INFO ExecReducer: conf classpath =
[file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/jars/classes,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/jars/,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/attempt_200910191834_0003_r_000000_0/]
2009-10-20 12:33:59,451 INFO ExecReducer: thread classpath =
[file:/opt/hadoop/conf/, file:/usr/lib/jvm/java-6-sun-1.6.0.14/lib/tools.jar,
file:/opt/hadoop/, file:/opt/hadoop/hadoop-0.20.0-core.jar,
file:/opt/hadoop/lib/commons-cli-2.0-SNAPSHOT.jar,
file:/opt/hadoop/lib/commons-codec-1.3.jar,
file:/opt/hadoop/lib/commons-el-1.0.jar,
file:/opt/hadoop/lib/commons-httpclient-3.0.1.jar,
file:/opt/hadoop/lib/commons-logging-1.0.4.jar,
file:/opt/hadoop/lib/commons-logging-api-1.0.4.jar,
file:/opt/hadoop/lib/commons-net-1.4.1.jar,
file:/opt/hadoop/lib/core-3.1.1.jar, file:/opt/hadoop/lib/hsqldb-1.8.0.10.jar,
file:/opt/hadoop/lib/jasper-compiler-5.5.12.jar,
file:/opt/hadoop/lib/jasper-runtime-5.5.12.jar,
file:/opt/hadoop/lib/jets3t-0.6.1.jar, file:/opt/hadoop/lib/jetty-6.1.14.jar,
file:/opt/hadoop/lib/jetty-util-6.1.14.jar,
file:/opt/hadoop/lib/junit-3.8.1.jar, file:/opt/hadoop/lib/kfs-0.2.2.jar,
file:/opt/hadoop/lib/log4j-1.2.15.jar, file:/opt/hadoop/lib/oro-2.0.8.jar,
file:/opt/hadoop/lib/servlet-api-2.5-6.1.14.jar,
file:/opt/hadoop/lib/slf4j-api-1.4.3.jar,
file:/opt/hadoop/lib/slf4j-log4j12-1.4.3.jar,
file:/opt/hadoop/lib/xmlenc-0.52.jar, file:/opt/hadoop/lib/jsp-2.1/jsp-2.1.jar,
file:/opt/hadoop/lib/jsp-2.1/jsp-api-2.1.jar,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/attempt_200910191834_0003_r_000000_0/work/,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/jars/classes,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/jars/,
file:/opt/data/hadoop/hadoop-hadoop/mapred/local/taskTracker/jobcache/job_200910191834_0003/attempt_200910191834_0003_r_000000_0/work/]
2009-10-20 12:33:59,721 INFO ExecReducer:
<JOIN>Id =5
<Children>
<FIL>Id =6
<Children>
<SEL>Id =7
<Children>
<GBY>Id =8
<Children>
<FS>Id =9
<Parent>Id = 8 <\Parent>
<\FS>
<\Children>
<Parent>Id = 7 <\Parent>
<\GBY>
<\Children>
<Parent>Id = 6 <\Parent>
<\SEL>
<\Children>
<Parent>Id = 5 <\Parent>
<\FIL>
<\Children>
<\JOIN>
2009-10-20 12:33:59,721 INFO org.apache.hadoop.hive.ql.exec.JoinOperator:
Initializing Self 5 JOIN
2009-10-20 12:33:59,723 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator:
COMMONJOIN struct<key:struct<joinkey0:bigint>,value:struct<>,alias:tinyint>
2009-10-20 12:33:59,733 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator:
JOIN struct<_col11:bigint> totalsz = 1
2009-10-20 12:33:59,733 INFO org.apache.hadoop.hive.ql.exec.JoinOperator:
Operator 5 JOIN initialized
2009-10-20 12:33:59,733 INFO org.apache.hadoop.hive.ql.exec.JoinOperator:
Initializing children of 5 JOIN
2009-10-20 12:33:59,733 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing child 6 FIL
2009-10-20 12:33:59,733 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing Self 6 FIL
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Operator 6 FIL initialized
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initializing children of 6 FIL
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing child 7 SEL
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing Self 7 SEL
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
SELECT struct<_col11:bigint>
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Operator 7 SEL initialized
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initializing children of 7 SEL
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
Initializing child 8 GBY
2009-10-20 12:33:59,740 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
Initializing Self 8 GBY
2009-10-20 12:33:59,743 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
Operator 8 GBY initialized
2009-10-20 12:33:59,743 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
Initializing children of 8 GBY
2009-10-20 12:33:59,743 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initializing child 9 FS
2009-10-20 12:33:59,743 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initializing Self 9 FS
2009-10-20 12:33:59,753 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Writing to temp file: FS
hdfs://crunch:54310/tmp/hive-hadoop/1733025425/_tmp.10002/_tmp.attempt_200910191834_0003_r_000000_0
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Operator 9 FS initialized
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
Initialization Done 9 FS
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator:
Initialization Done 8 GBY
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.SelectOperator:
Initialization Done 7 SEL
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.FilterOperator:
Initialization Done 6 FIL
2009-10-20 12:34:00,404 INFO org.apache.hadoop.hive.ql.exec.JoinOperator:
Initialization Done 5 JOIN
2009-10-20 12:34:00,460 INFO ExecReducer: ExecReducer: processing 1 rows: used
memory = 121157288
2009-10-20 12:34:00,460 INFO ExecReducer: ExecReducer: processing 10 rows: used
memory = 121157288
2009-10-20 12:34:00,462 INFO ExecReducer: ExecReducer: processing 100 rows:
used memory = 121157288
2009-10-20 12:34:00,511 INFO ExecReducer: ExecReducer: processing 1000 rows:
used memory = 121157288
2009-10-20 12:34:00,511 WARN org.apache.hadoop.hive.ql.exec.CommonJoinOperator:
table 0 has 1000 rows for join key [null]
2009-10-20 12:34:00,611 INFO ExecReducer: ExecReducer: processing 10000 rows:
used memory = 121715120
2009-10-20 12:34:01,036 INFO ExecReducer: ExecReducer: processing 100000 rows:
used memory = 126734256
2009-10-20 12:34:05,667 INFO ExecReducer: ExecReducer: processing 1000000 rows:
used memory = 52519744
2009-10-20 12:34:09,659 INFO ExecReducer: ExecReducer: processing 2000000 rows:
used memory = 107668224
2009-10-20 12:34:15,596 INFO ExecReducer: ExecReducer: processing 3000000 rows:
used memory = 142126200
2009-10-20 12:34:19,686 INFO ExecReducer: ExecReducer: processing 4000000 rows:
used memory = 202505560
2009-10-20 12:35:09,479 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.<init>(ArrayList.java:112)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.computeValues(CommonJoinOperator.java:257)
at
org.apache.hadoop.hive.ql.exec.JoinOperator.process(JoinOperator.java:52)
at
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:207)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:465)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:413)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-10-20 12:35:39,325 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
cleanup for the task