Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Andrew Otto Thu, 28 May 2015 07:27:11 -0700

> val sqlContext = new HiveContext(sc)
> val schemaRdd = sqlContext.sql("some complex SQL")



It mostly works, but have been having issues with tables that contains a large 
amount of data:

https://issues.apache.org/jira/browse/SPARK-6910 
<https://issues.apache.org/jira/browse/SPARK-6910>


> On May 27, 2015, at 20:52, Sanjay Subramanian 
> <sanjaysubraman...@yahoo.com.INVALID> wrote:
> 
> hey guys
> 
> On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , 
> there are about 300+ hive tables.
> The data is stored an text (moving slowly to Parquet) on HDFS.
> I want to use SparkSQL and point to the Hive metadata and be able to define 
> JOINS etc using a programming structure like this 
> 
> import org.apache.spark.sql.hive.HiveContext
> val sqlContext = new HiveContext(sc)
> val schemaRdd = sqlContext.sql("some complex SQL")
> 
> 
> Is that the way to go ? Some guidance will be great.
> 
> thanks
> 
> sanjay
> 
> 
>

Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Reply via email to