[jira] [Updated] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

Jira Sun, 11 Apr 2021 23:25:14 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-24746:
--------------------------------
    Fix Version/s: 4.0.0

> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-24746
>                 URL: https://issues.apache.org/jira/browse/HIVE-24746
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to 
> "TimeStamp" comparisons. It has to construct the entire TimeStamp object via 
> OI (which incurs LocalTime computation etc internally).
>  
> All these are done for "equals" comparison which can be done with "seconds & 
> nanoseconds" present in TimeStamp.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852]
>  
>  
> Request is to explore optimising this code path, so that equals() can be 
> performed with "seconds/nanoseconds" instead of entire timestamp
>  
> {noformat}
> at 
> org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
>       at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
>       at 
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
>       at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
>       at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
>       at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24746) PTF: TimestampValueBoundaryScanner can be optimised during range computation

Reply via email to