GitHub user zellerh opened a pull request:
https://github.com/apache/trafodion/pull/1420
[TRAFODION-2912] Better handling of non-deterministic scalar UDFs
Fix some issues found by Andy Yang and others while writing a
non-deterministic scalar UDF (a random generator in this case).
This UDF was transformed into a hash join, which executes the UDF
only once and not once per row. Another problem is the probe cache,
which can also lead to a single execution instead of once per row.
The fix records the non-deterministic UDF attribute in the group
attributes and it adds checks in the normalizer to suppress the
conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
present. The probe cache logic already had this check, so all that was
needed was to set the attribute.
Note that there may be some more complex queries where we still won't
execute the UDF once per row. In general, there is no absolute
guarantee that a non-deterministic scalar UDF is executed once per row
(of the cartesian product of all the tables joined??). However, in
simple cases like the added test we should try to call the UDF for
every row that satisfies the join predicates.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zellerh/trafodion bug/R23
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/trafodion/pull/1420.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1420
----
commit a725654560fabfafc2b81568720f43a24a1b3007
Author: Hans Zeller <hzeller@...>
Date: 2018-01-29T19:20:50Z
[TRAFODION-2912] Better handling of non-deterministic scalar UDFs
Fix some issues found by Andy Yang and others while writing a
non-deterministic scalar UDF (a random generator in this case).
This UDF was transformed into a hash join, which executes the UDF
only once and not once per row. Another problem is the probe cache,
which can also lead to a single execution instead of once per row.
The fix records the non-deterministic UDF attribute in the group
attributes and it adds checks in the normalizer to suppress the
conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
present. The probe cache logic already had this check, so all that was
needed was to set the attribute.
Note that there may be some more complex queries where we still won't
execute the UDF once per row. In general, there is no absolute
guarantee that a non-deterministic scalar UDF is executed once per row
(of the cartesian product of all the tables joined??). However, in
simple cases like the added test we should try to call the UDF for
every row that satisfies the join predicates.
----
---