[jira] [Comment Edited] (HIVE-10712) Hive on Apache Flink

Edward Capriolo (JIRA) Sun, 17 May 2015 07:37:41 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547189#comment-14547189
 ]


Edward Capriolo edited comment on HIVE-10712 at 5/17/15 2:36 PM:
-----------------------------------------------------------------

I have a question I want us all to consider. Hive currently has three execution 
engines. What is the value to adding a fourth one? I know on one hand that hive 
is an open source project and we do not wan to be outright rejecting ideas and 
directions but we have to ask ourselves is Flink so significantly different 
from spark or tez that we can justify the addition? In terms of the project 
having another engine we have more code, more dependencies, more tests.

The project is already divided down the lines of supporting hive-on-tez and 
hive-on-spark. What is the value of a third camp? Hive has many different 
supported queries, but if Flink basically delivers the same performance as one 
of the back-ends on the majority of the queries I do not think it is a good 
direction. What if a 4th or 5th group come up with their own "execution engine" 
 Hive on storm, hive-on-samza, hive-on-eds-query-engine.

What value does an end user get between having to chose between this many 
engines where they face conflicting advice from conflicting people over which 
one they should use? As well as conflicting debates across the community as to 
which is the fastest/best. 

At this point I would like to have a real justification as to why we should add 
a 4th engine, for me not to be -1. We need some examples of some serious 
feature in flink that makes a large number of end-user queries faster/better 
otherwise I think this is just an academic pursuit that will further fragment 
us. Otherwise every data processing platform that has a map and reduce 
primitive can lobby for inclusion into hive.

 




was (Author: appodictic):
I have a question I want us all to consider. Hive currently has three execution 
engines. What is the value to adding a fourth one? I know on one hand that hive 
is an open source project and we do not wan to be outright rejecting ideas and 
directions but we have to ask ourselves is Flink so significantly different 
from spark or tez that we can justify the addition? In terms of the project 
having another engine we have more code, more dependencies, more tests.

The project is already divided down the lines of supporting hive-on-tez and 
hive-on-spark. What is the value of a third camp? Hive has many different 
supported queries, but if Flink basically delivers the same performance as one 
of the back end on the majority of the queries I do not think it is a good 
direction. What if a 4th or 5th group come up with their own "execution engine" 
 Hive on storm, hive-on-samza, hive-on-eds-query-engine.

What value does an end user get between having to chose between this many 
engines where they face conflicting advice from conflicting people over which 
one they should use? As well as conflicting debates across the community as to 
which is the fastest/best. 

At this point I would like to have a real justification as to why we should add 
a 4th engine, for me not to be -1. We need some examples of some serious 
feature in flink that makes a large number of end-user queries faster/better 
otherwise I think this is just an academic pursuit that will further fragment 
us. Otherwise every data processing platform that has a map and reduce 
primitive can lobby for inclusion into hive.

 



> Hive on Apache Flink
> --------------------
>
>                 Key: HIVE-10712
>                 URL: https://issues.apache.org/jira/browse/HIVE-10712
>             Project: Hive
>          Issue Type: Wish
>            Reporter: Greg Senia
>
> Flink as an open-source data analytics cluster computing framework has gained 
> some momentum recently. This initiative will provide user a new alternative 
> so that those user can consolidate their backend.
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Flink users to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Flink also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez/Spark does.
> This is an umbrella JIRA which will cover many coming subtask.  Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10712) Hive on Apache Flink

Reply via email to