[
https://issues.apache.org/jira/browse/TAJO-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502318#comment-14502318
]
Jihoon Son commented on TAJO-1562:
----------------------------------
Hi guys. This is the first proposal.
Honestly, I'm not much familiar with Python, so, this proposal may be weird.
Welcome any suggestions and comments.
I investigated several features of Python. Finally, I think that the class of
Python looks appropriate to support UDAF. That is, users can define a new UDAF
by defining a Python class which inherits a pre-defined AbstractUdaf class.
Here is an example.
*AbstractUdaf class*
{code}
from tajo_util import output_type
class AbstractUdaf:
def __init__(self):
return
@output_type('text')
def name(self):
"""Return the function name"""
return
def eval(self, item):
"""Eval item at the first stage"""
return
def merge(self, item):
"""Merge the result of the first stage"""
return
def terminate(self):
"""Get the final result"""
return
{code}
*SumPy class Example*
{code}
from tajo_util import output_type
from tajo_udaf import AbstractUdaf
class SumPy(AbstractUdaf):
name = 'sum_py'
aggregated = 0
# return the function name
@output_type('text')
def name(self):
return self.name
# eval at the first stage
@output_type('int8')
def eval(self, item):
self.aggregated += item
# merge the result of the first stage
@output_type('int8')
def merge(self, item):
self.aggregated += item
# get the final result
@output_type('int8')
def terminate(self):
return self.aggregated
{code}
To do support this form of UDAFs, we should support a general way to maintain
the aggregated values, e.g., aggregated in SumPy, between different stages. I
think that this can be solved by serializing/deserializing them as a tuple.
> Python UDAF support
> -------------------
>
> Key: TAJO-1562
> URL: https://issues.apache.org/jira/browse/TAJO-1562
> Project: Tajo
> Issue Type: New Feature
> Components: function/udf
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.11.0
>
>
> We need to support Python UDAF as well as UDF (TAJO-1344).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)