[ 
https://issues.apache.org/jira/browse/TAJO-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502318#comment-14502318
 ] 

Jihoon Son commented on TAJO-1562:
----------------------------------

Hi guys. This is the first proposal.
Honestly, I'm not much familiar with Python, so, this proposal may be weird. 
Welcome any suggestions and comments.

I investigated several features of Python. Finally, I think that the class of 
Python looks appropriate to support UDAF. That is, users can define a new UDAF 
by defining a Python class which inherits a pre-defined AbstractUdaf class.
Here is an example.

*AbstractUdaf class*
{code}
from tajo_util import output_type


class AbstractUdaf:

    def __init__(self):
        return

    @output_type('text')
    def name(self):
        """Return the function name"""
        return

    def eval(self, item):
        """Eval item at the first stage"""
        return

    def merge(self, item):
        """Merge the result of the first stage"""
        return

    def terminate(self):
        """Get the final result"""
        return
{code}

*SumPy class Example*
{code}
from tajo_util import output_type
from tajo_udaf import AbstractUdaf


class SumPy(AbstractUdaf):
    name = 'sum_py'
    aggregated = 0

    # return the function name
    @output_type('text')
    def name(self):
        return self.name

    # eval at the first stage
    @output_type('int8')
    def eval(self, item):
        self.aggregated += item

    # merge the result of the first stage
    @output_type('int8')
    def merge(self, item):
        self.aggregated += item

    # get the final result
    @output_type('int8')
    def terminate(self):
        return self.aggregated
{code}

To do support this form of UDAFs, we should support a general way to maintain 
the aggregated values, e.g., aggregated in SumPy, between different stages. I 
think that this can be solved by serializing/deserializing them as a tuple.

> Python UDAF support
> -------------------
>
>                 Key: TAJO-1562
>                 URL: https://issues.apache.org/jira/browse/TAJO-1562
>             Project: Tajo
>          Issue Type: New Feature
>          Components: function/udf
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> We need to support Python UDAF as well as UDF (TAJO-1344). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to