[
https://issues.apache.org/jira/browse/HIVE-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643360#comment-15643360
]
Gopal V commented on HIVE-15138:
--------------------------------
This looks like the query should be modified to use an arithmetic interval
expression? (HIVE-5021)
> String + Integer gets converted to UDFToDouble causing number format
> exceptions
> -------------------------------------------------------------------------------
>
> Key: HIVE-15138
> URL: https://issues.apache.org/jira/browse/HIVE-15138
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Priority: Minor
>
> TPCDS Query 72 has {{"d3.d_date > d1.d_date + 5"}} where in, d_date contains
> data like {{2002-02-03, 2001-11-07}}. When running this query, compiler
> converts this into UDFToDouble and causes large number of
> {{NumberFormatExceptions}} trying to convert string to double. Example Stack
> trace is given below, which can be a good amount of perf hit filling up the
> stack for every row, depending on the amount of data.
> {noformat}
> "TezTaskRunner" #41340 daemon prio=5 os_prio=0 tid=0x00007f7914745000
> nid=0x9725 runnable [0x00007f787ee4a000]
> java.lang.Thread.State: RUNNABLE
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> - locked <0x00007f804b125ab0> (a java.lang.NumberFormatException)
> at java.lang.Throwable.<init>(Throwable.java:265)
> at java.lang.Exception.<init>(Exception.java:66)
> at java.lang.RuntimeException.<init>(RuntimeException.java:62)
> at
> java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:52)
> at
> java.lang.NumberFormatException.<init>(NumberFormatException.java:55)
> at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
> at java.lang.Double.parseDouble(Double.java:538)
> at
> org.apache.hadoop.hive.ql.udf.UDFToDouble.evaluate(UDFToDouble.java:172)
> at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:967)
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:194)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:194)
> at
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121)
> at
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColGreaterDoubleColumn.evaluate(FilterDoubleColGreaterDoubleColumn.java:51)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:110)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:144)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
> at
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> {noformat}
> Simple query to reproduce this issue is given below. It would be helpful if
> hive gives explicit WARN messages so that end user can add explicit casts to
> avoid such situations.
> {noformat}
> Latest Hive (master): (Check UDFToDouble for d_date field)
> ====================
> hive> explain select distinct d_date + 5 from date_dim limit 10;
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: rbalamohan_20161107005816_1cc412bf-c19c-45c4-b468-236e4fc8ae09:8
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: date_dim
> Statistics: Num rows: 73049 Data size: 82034027 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: (UDFToDouble(d_date) + 5.0) (type: double)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 82034027 Basic
> stats: COMPLETE Column stats: NONE
> Group By Operator
> keys: _col0 (type: double)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 82034027 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: double)
> sort order: +
> Map-reduce partition columns: _col0 (type: double)
> Statistics: Num rows: 73049 Data size: 82034027 Basic
> stats: COMPLETE Column stats: NONE
> TopN Hash Memory Usage: 0.04
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: double)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 41016452 Basic stats:
> COMPLETE Column stats: NONE
> Limit
> Number of rows: 10
> Statistics: Num rows: 10 Data size: 11230 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 10 Data size: 11230 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: 10
> Processor Tree:
> ListSink
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)