[
https://issues.apache.org/jira/browse/HIVE-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated HIVE-13189:
------------------------------------
Attachment: HIVE-13189.1.patch
At TPCH-1 TB, runtime with patch drops from 244 seconds to 169 seconds.
Without Patch
{noformat}
create temporary table x as select l_receiptdate,
date_add(to_date(l_receiptdate), 3) from lineitem;
Status: Running (Executing on YARN cluster with App id
application_1456147314798_24782)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 262 262 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 244.05 s
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 244.05 seconds
METHOD DURATION(ms)
parse 4
semanticAnalyze 920
TezBuildDag 328
TezSubmitToRunningDag 410
TotalPrepTime 2,168
VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS
CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS
Map 1 262 0 0 240.99
31,358,960 306,167 5,999,989,709 0
{noformat}
With Patch:
{noformat}
Status: Running (Executing on YARN cluster with App id
application_1456147314798_24788)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 262 262 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 169.15 s
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 169.15 seconds
METHOD DURATION(ms)
parse 24
semanticAnalyze 1,545
TezBuildDag 242
TezSubmitToRunningDag 258
TotalPrepTime 2,768
VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS
CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS
Map 1 262 0 0 166.24
21,189,670 158,159 5,999,989,709 0
{noformat}
If the approach is fine, this can be extended to datediff as well.
> Consider using Joda DateTimeFormatter instead of SimpleDateFormat in
> GenericUDFDateAdd
> --------------------------------------------------------------------------------------
>
> Key: HIVE-13189
> URL: https://issues.apache.org/jira/browse/HIVE-13189
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Reporter: Rajesh Balamohan
> Assignee: varun a kumar
> Attachments: HIVE-13189.1.patch
>
>
> Quite an amount was spent by tasks in trying to parse date string in
> GenericUDFDateAdd.
> {noformat}
> java.lang.Thread.State: RUNNABLE
> at java.text.DecimalFormat.subparse(DecimalFormat.java:1467)
> at java.text.DecimalFormat.parse(DecimalFormat.java:1268)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:2088)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1455)
> at java.text.DateFormat.parse(DateFormat.java:355)
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFDateAdd.evaluate(GenericUDFDateAdd.java:172)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:87)
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPGreaterThan.evaluate(GenericUDFOPGreaterThan.java:80)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:108)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
> {noformat}
> Joda DateTimeFormatter can be considered for better performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)