[jira] [Commented] (HIVEMALL-20) Improve the performance of Hive integration in Spark

Takeshi Yamamuro (JIRA) Sun, 20 Nov 2016 22:36:35 -0800

    [ 
https://issues.apache.org/jira/browse/HIVEMALL-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682687#comment-15682687
 ]


Takeshi Yamamuro commented on HIVEMALL-20:
------------------------------------------

I made a pr to support codegen for UDF and GenericUDF (SPARK-18478). 
`Generator` is a Spark plan node that `GenericUDTF` depends on and this node 
has recently supported codegen in SPARK-15214. So, we can also improve 
`GenericUDTF` performance along with SPARK-18478.

> Improve the performance of Hive integration in Spark
> ----------------------------------------------------
>
>                 Key: HIVEMALL-20
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-20
>             Project: Hivemall
>          Issue Type: Improvement
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>
> Most of Hivemall functions depend on Hive interfaces (UDF, GenericUDF, 
> GenericUDTF, ...), but Spark currently has overheads to call these interfaces 
> (https://github.com/myui/hivemall/blob/master/spark/spark-2.0/src/test/scala/org/apache/spark/sql/hive/benchmark/MiscBenchmark.scala).
>  Therefore, some functions such as sigmoid and each_top_k have been 
> re-implemented as native Spark functionality. This re-implementation seems to 
> worsen maintainability, so we'd better off improving the overheads in Spark. 
> This ticket is to track all related the activities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVEMALL-20) Improve the performance of Hive integration in Spark

Reply via email to