Takeshi Yamamuro created HIVEMALL-20:
----------------------------------------
Summary: Improve the performance of Hive integration in Spark
Key: HIVEMALL-20
URL: https://issues.apache.org/jira/browse/HIVEMALL-20
Project: Hivemall
Issue Type: Improvement
Reporter: Takeshi Yamamuro
Assignee: Takeshi Yamamuro
Most of Hivemall functions depend on Hive interfaces (UDF, GenericUDF,
GenericUDTF, ...), but Spark currently has overheads to call these interfaces
(https://github.com/myui/hivemall/blob/master/spark/spark-2.0/src/test/scala/org/apache/spark/sql/hive/benchmark/MiscBenchmark.scala).
Therefore, some functions such as sigmoid and each_top_k have been
re-implemented as native Spark functionality. This re-implementation seems to
worsen maintainability, so we'd better off improving the overheads in Spark.
This ticket is to track all related the activities.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)