[
https://issues.apache.org/jira/browse/HIVEMALL-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682765#comment-15682765
]
Takeshi Yamamuro commented on HIVEMALL-20:
------------------------------------------
This description does not intend to codegen Hivemall functions themselves and
Hivemall needs no change for the codegen supports. Re-implementing Hivemall
functions as Spark native ones is good though, the cost of re-implementation
and maintainability is high to me. If the overheads of Hive and Spark UDFs
could be almost the same with each other, we didn't need to replace them.
> Improve the performance of Hive integration in Spark
> ----------------------------------------------------
>
> Key: HIVEMALL-20
> URL: https://issues.apache.org/jira/browse/HIVEMALL-20
> Project: Hivemall
> Issue Type: Improvement
> Reporter: Takeshi Yamamuro
> Assignee: Takeshi Yamamuro
>
> Most of Hivemall functions depend on Hive interfaces (UDF, GenericUDF,
> GenericUDTF, ...), but Spark currently has overheads to call these interfaces
> (https://github.com/myui/hivemall/blob/master/spark/spark-2.0/src/test/scala/org/apache/spark/sql/hive/benchmark/MiscBenchmark.scala).
> Therefore, some functions such as sigmoid and each_top_k have been
> re-implemented as native Spark functionality. This re-implementation seems to
> worsen maintainability, so we'd better off improving the overheads in Spark.
> This ticket is to track all related the activities.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)