Premal Shah created TEZ-3499:
--------------------------------
Summary: OOM in local mode
Key: TEZ-3499
URL: https://issues.apache.org/jira/browse/TEZ-3499
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.8.4
Reporter: Premal Shah
We have a UDF in hive which take in some values and outputs a score. When
running a query on a table which calls the score function on every row, looks
like tez is not running the query on YARN, but trying to run it in local mode.
It then runs out of memory trying to insert that data into a table.
Here's the query
{noformat}
ADD JAR score.jar;
CREATE TEMPORARY FUNCTION score AS 'hive.udf.ScoreUDF';
insert overwrite table abc
SELECT
id,
score(col1, col2) as score
, '2016-10-11' AS dt
FROM input_table
;
{noformat}
Here's the output of the shell
{noformat}
Query ID = hadoop_20161028232841_5a06db96-ffaa-4e75-a657-c7cb46ccb3f5
Total jobs = 1
Launching Job 1 out of 1
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
at java.lang.StringBuilder.append(StringBuilder.java:202)
at com.google.protobuf.TextFormat.escapeBytes(TextFormat.java:1283)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:394)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
at
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
at
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
at
com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
at com.google.protobuf.TextFormat.shortDebugString(TextFormat.java:88)
FAILED: Execution Error, return code -101 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Java heap space
{noformat}
It looks like the job is not getting submitted to the cluster, but running
locally.
We can't get tez to run the code on the cluster. The hive shell starts with an
Xmx of 4G.
What should we change to avoid this problem?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)