[
https://issues.apache.org/jira/browse/IMPALA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062786#comment-18062786
]
ASF subversion and git services commented on IMPALA-14575:
----------------------------------------------------------
Commit 7d71ec141f85746de50685d77e05314553b235a2 in impala's branch
refs/heads/master from Balazs Hevele
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7d71ec141 ]
IMPALA-14575: Add constant handling for Hive GenericUDFs
Based on PoC by Csaba Ringhofer.
If an argument is constant, it is now only evaluated and copied
to the input buffer once, in HiveUdfCall::OpenEvaluator.
This means, we save the re-evaluation and re-copying of the value
for each evaluation.
The "isConstant" flags for arguments are also sent to the frontend,
where ConstantObjectInspectors will be created for them, so any
constant optimization in the UDF on Hive's side will be enabled.
Moved input handling for Hive UDF calls to a separate class
HiveUdfInputHandler.
Benchmark:
Checked with the following:
set num_nodes=1; set mt_dop=1;
select count(*) from tpch.lineitem where st_intersects(
st_point(l_partkey, l_suppkey),
st_geomfromtext(
"polygon ((0 0, 0 500000, 500000 500000, 500000 0, 0 0))"
)
);
Before change: 3.15s (MaterializeTupleTime: 2s601ms)
After change: 1.54s (MaterializeTupleTime: 879.397ms)
In some cases with geospatial, even bigger performance gain:
select count(*) from tpch.lineitem where st_intersects(
st_point(l_partkey, l_suppkey),
st_buffer(st_geomfromtext("point (250000 250000)"), 250000)
);
Before change: MaterializeTupleTime: 19s266ms
After change: MaterializeTupleTime: 1s587ms
Note that st_intersects is optimized on the Hive side when one argument
is a constant, contributing to most of the gain in performance.
The skipping of re-evaluation and copying is relatively insignificant,
comparing only that doesn't yield any measurable difference.
Testing:
-added a test UDF GenericAlltypeArgConstCheckUdf that prints
const information about arguments
-added const arg check cases to generic-java-udf.test
-added legacy udf call tests for several types to java-udf.test
to make sure it still works for all types
Change-Id: I4a6ca8c0bab499dffed88bb9786753da559af4c5
Reviewed-on: http://gerrit.cloudera.org:8080/23963
Reviewed-by: Csaba Ringhofer <[email protected]>
Tested-by: Michael Smith <[email protected]>
> Optimize constants in HiveUdfCall
> ---------------------------------
>
> Key: IMPALA-14575
> URL: https://issues.apache.org/jira/browse/IMPALA-14575
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend, Frontend
> Reporter: Csaba Ringhofer
> Assignee: Balazs Hevele
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> Hive can signal to a UDF that an argument is constant with
> ConstantObjectInspector:
> https://github.com/apache/hive/blob/d9ec04156d84bedbaa9f8dc40c27dbb88a3b9f49/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L1440
> https://github.com/apache/hive/blob/d9ec04156d84bedbaa9f8dc40c27dbb88a3b9f49/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L1512
> This is used by st_ relation functions to optimize the common case of
> comparing many geometries to a constant geometry:
> https://github.com/apache/hive/blob/d9ec04156d84bedbaa9f8dc40c27dbb88a3b9f49/ql/src/java/org/apache/hadoop/hive/ql/udf/esri/ST_GeometryRelational.java#L93
> Impala knows in c++ whether an argument is constant, but this information is
> not passed to the Java side:
> https://github.com/apache/impala/blob/2ac5a24dc0cfc9c9e7a1fc86cccf94cd1a2900af/common/thrift/Frontend.thrift#L41
> Optimizing constants could help by:
> - passing the argument from cpp to Java only once
> - allowing const aware optimizations in UDFs
> TODO:
> only checked generic UDFs, not sure if this exists for legacy UDFs
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]