GitHub user zellerh opened a pull request:

    https://github.com/apache/trafodion/pull/1420

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs

    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zellerh/trafodion bug/R23

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1420
    
----
commit a725654560fabfafc2b81568720f43a24a1b3007
Author: Hans Zeller <hzeller@...>
Date:   2018-01-29T19:20:50Z

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs
    
    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

----


---

Reply via email to