[
https://issues.apache.org/jira/browse/HIVE-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547189#comment-14547189
]
Edward Capriolo commented on HIVE-10712:
----------------------------------------
I have a question I want us all to consider. Hive currently has three execution
engines. What is the value to adding a fourth one? I know on one hand that hive
is an open source project and we do not wan to be outright rejecting ideas and
directions but we have to ask ourselves is Flink so significantly different
from spark or tez that we can justify the addition? In terms of the project
having another engine we have more code, more dependencies, more tests.
The project is already divided down the lines of supporting hive-on-tez and
hive-on-spark. What is the value of a third camp? Hive has many different
supported queries, but if Flink basically delivers the same performance as one
of the back end on the majority of the queries I do not think it is a good
direction. What if a 4th or 5th group come up with their own "execution engine"
Hive on storm, hive-on-samza, hive-on-eds-query-engine.
What value does an end user get between having to chose between this many
engines where they face conflicting advice from conflicting people over which
one they should use? As well as conflicting debates across the community as to
which is the fastest/best.
At this point I would like to have a real justification as to why we should add
a 4th engine. For me not to be -1 we need some examples of some serious feature
in flink that makes a large number of end-user queries faster/better otherwise
I think this is just an academic pursuit that will further fragment us.
> Hive on Apache Flink
> --------------------
>
> Key: HIVE-10712
> URL: https://issues.apache.org/jira/browse/HIVE-10712
> Project: Hive
> Issue Type: Wish
> Reporter: Greg Senia
>
> Flink as an open-source data analytics cluster computing framework has gained
> some momentum recently. This initiative will provide user a new alternative
> so that those user can consolidate their backend.
> Secondly, providing such an alternative further increases Hive's adoption as
> it exposes Flink users to a viable, feature-rich de facto standard SQL tools
> on Hadoop.
> Finally, allowing Hive to run on Flink also has performance benefits. Hive
> queries, especially those involving multiple reducer stages, will run faster,
> thus improving user experience as Tez/Spark does.
> This is an umbrella JIRA which will cover many coming subtask. Feedback from
> the community is greatly appreciated!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)