[ 
https://issues.apache.org/jira/browse/TAJO-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159385#comment-14159385
 ] 

ASF GitHub Bot commented on TAJO-1092:
--------------------------------------

Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/178#issuecomment-57923814
  
    An example of static method in Java class is 
https://github.com/apache/tajo/pull/178/files#diff-f6daa76b2459470a9f3412131c0f726bR34.
    
    I designed the function annotation system to point Function Collection, 
which is a class including multiple static functions. For user-defined 
functions and built-in functions, just add function as the example. It is very 
easy and it enables Tajo to reuse existing functions.
    
    Besides, as you can see, SQL is based on three-valued logic 
(http://en.wikipedia.org/wiki/Three-valued_logic). So, each value can be 
nullable. Despite of boolean type, one boolean type value can be three values: 
TRUE, FALSE, and UNKNOWN (NULL in SQL). In the current function system, each 
function must deal with NULL value explicitly. Most of functions usually return 
NULL if at least of one parameter is NULL. ```Substr``` function is an example 
(https://github.com/apache/tajo/blob/master/tajo-core/src/main/java/org/apache/tajo/engine/function/string/Substr.java#L63).
 It gives users burden, and it is easy for users to forget NULL handling when 
users implement user-defined functions.
    
    In order to mitigate such a problem and to make function invocation more 
efficiently, I designed new function binder and new function definition 
approach to keep hints how a function handles NULL value.
    
    The hints are described in function parameters in a function definition. 
You can specify the hints by using java primitive type or class primitive type 
as each parameter according to null handling way. 
    
    For example:
    
    This ```pow``` function does not allow NULL values as input parameter. In 
this case, if at least one parameter is null, this function binder will 
automatically return NULL value without invoking this function. So, this 
function itself does not need to handle NULL value explicitly.
    ````
    @ScalarFunction(name = "pow", returnType = FLOAT8, paramTypes = {FLOAT8, 
FLOAT8})
       public static double pow(double x, double y) {
         return Math.pow(x, y);
    }
    ```
    
    The following function definition allow NULL value as both input 
parameters. In this case, this function must handle NULL value explicitly.
    ```
    @ScalarFunction(name = "pow", returnType = FLOAT8, paramTypes = {FLOAT8, 
FLOAT8})
    public static Double pow(Double x, Double y) {
      if (x == null || y == null) {
        return null;
      }
      return Math.pow(x, y);
    }
    ```
    
    In addition, the function binder allows a mixed use of primitive types and 
class primitive types. When mixed definition is used, the function binder only 
allow class primitive types to handle NULL values explicitly.
    
    Finally, the function binder is generated on the fly by java byte code 
generation technique, and it does not have any overheads even though the logic 
is very complex. Also, I'm expecting that this idea will remove significantly 
the overhead of Datum uses in the existing function system.


> Improve the function system to allow other function implementation types
> ------------------------------------------------------------------------
>
>                 Key: TAJO-1092
>                 URL: https://issues.apache.org/jira/browse/TAJO-1092
>             Project: Tajo
>          Issue Type: Improvement
>          Components: function/udf
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.9.1, block_iteration
>
>
> In the current function system, each function implementation is a single Java 
> class subclassed from org.apache.tajo.catalog.function.Function. 
> In this approach, there are many rooms for improvement. This approach always 
> uses Datum as input and output values of functions, creating unnecessary 
> objects. It does not likely to exploit given information included query 
> statements; for example, some parameters are constants or variables.
> In this issue, I propose the improvement to allow the function system to 
> support other function implementation types. In addition, I propose three 
> function implementation types:
> - legacy Java class function provided by the current Tajo
> - static method in Java class
> - code generation by ASM
> Later, we could expand this feature to allow Pig or Hive functions in Tajo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to