Re: How spark and hive integrate in long term?

Zhan Zhang Fri, 21 Nov 2014 15:47:01 -0800

Thanks Dean, for the information.

Hive-on-spark is nice. Spark sql has the advantage to take the full advantage 
of spark and allows user to manipulate the table as RDD through native spark 
support.


When I tried to upgrade the current hive-0.13.1 support to hive-0.14.0. I found 
the hive parser is not compatible any more. In the meantime, those new feature 
introduced in hive-0.14.1, e.g, ACID, etc, is not there yet. In the meantime, 
spark-0.12 also
has some nice feature added which is supported by thrift-server too, e.g., 
hive-0.13, table cache, etc. 

Given that both have more and more features added, it would be great if user 
can take advantage of both. Current, spark sql give us such benefits partially, 
but I am wondering how to keep such integration in long term.

Thanks.

Zhan Zhang

On Nov 21, 2014, at 3:12 PM, Dean Wampler <deanwamp...@gmail.com> wrote:

> I can't comment on plans for Spark SQL's support for Hive, but several
> companies are porting Hive itself onto Spark:
> 
> http://blog.cloudera.com/blog/2014/11/apache-hive-on-apache-spark-the-first-demo/
> 
> I'm not sure if they are leveraging the old Shark code base or not, but it
> appears to be a fresh effort.
> 
> dean
> 
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
> 
> On Fri, Nov 21, 2014 at 2:51 PM, Zhan Zhang <zhaz...@gmail.com> wrote:
> 
>> Now Spark and hive integration is a very nice feature. But I am wondering
>> what the long term roadmap is for spark integration with hive. Both of
>> these
>> two projects are undergoing fast improvement and changes. Currently, my
>> understanding is that spark hive sql part relies on hive meta store and
>> basic parser to operate, and the thrift-server intercept hive query and
>> replace it with its own engine.
>> 
>> With every release of hive, there need a significant effort on spark part
>> to
>> support it.
>> 
>> For the metastore part, we may possibly replace it with hcatalog. But given
>> the dependency of other parts on hive, e.g., metastore, thriftserver,
>> hcatlog may not be able to help much.
>> 
>> Does anyone have any insight or idea in mind?
>> 
>> Thanks.
>> 
>> Zhan Zhang
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-spark-and-hive-integrate-in-long-term-tp9482.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: How spark and hive integrate in long term?

Reply via email to