[jira] [Created] (HIVEMALL-20) Improve the performance of Hive integration in Spark

Takeshi Yamamuro (JIRA) Sun, 20 Nov 2016 22:27:37 -0800

Takeshi Yamamuro created HIVEMALL-20:
----------------------------------------


             Summary: Improve the performance of Hive integration in Spark
                 Key: HIVEMALL-20
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-20
             Project: Hivemall
          Issue Type: Improvement
            Reporter: Takeshi Yamamuro
            Assignee: Takeshi Yamamuro


Most of Hivemall functions depend on Hive interfaces (UDF, GenericUDF, 
GenericUDTF, ...), but Spark currently has overheads to call these interfaces 
(https://github.com/myui/hivemall/blob/master/spark/spark-2.0/src/test/scala/org/apache/spark/sql/hive/benchmark/MiscBenchmark.scala).
 Therefore, some functions such as sigmoid and each_top_k have been 
re-implemented as native Spark functionality. This re-implementation seems to 
worsen maintainability, so we'd better off improving the overheads in Spark. 
This ticket is to track all related the activities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVEMALL-20) Improve the performance of Hive integration in Spark

Reply via email to