[
https://issues.apache.org/jira/browse/HIVE-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-24746:
--------------------------------
Fix Version/s: 4.0.0
> PTF: TimestampValueBoundaryScanner can be optimised during range computation
> ----------------------------------------------------------------------------
>
> Key: HIVE-24746
> URL: https://issues.apache.org/jira/browse/HIVE-24746
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> During range computation, timestamp ranges become a hotspot due to
> "TimeStamp" comparisons. It has to construct the entire TimeStamp object via
> OI (which incurs LocalTime computation etc internally).
>
> All these are done for "equals" comparison which can be done with "seconds &
> nanoseconds" present in TimeStamp.
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L852]
>
>
> Request is to explore optimising this code path, so that equals() can be
> performed with "seconds/nanoseconds" instead of entire timestamp
>
> {noformat}
> at
> org.apache.hadoop.hive.common.type.Timestamp.setTimeInSeconds(Timestamp.java:133)
> at
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.populateTimestamp(TimestampWritableV2.java:401)
> at
> org.apache.hadoop.hive.serde2.io.TimestampWritableV2.getTimestamp(TimestampWritableV2.java:210)
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1239)
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1181)
> at
> org.apache.hadoop.hive.ql.udf.ptf.TimestampValueBoundaryScanner.isEqual(ValueBoundaryScanner.java:848)
> at
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEndCurrentRow(ValueBoundaryScanner.java:593)
> at
> org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeEnd(ValueBoundaryScanner.java:530)
> at
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getRange(BasePartitionEvaluator.java:273)
> at
> org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.iterate(BasePartitionEvaluator.java:219)
> at
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateWindowFunction(WindowingTableFunction.java:147)
> at
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.access$100(WindowingTableFunction.java:61)
> at
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction$WindowingIterator.next(WindowingTableFunction.java:755)
> at
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:373)
> at
> org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:104)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:732)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:756)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)