> val sqlContext = new HiveContext(sc) > val schemaRdd = sqlContext.sql("some complex SQL")
It mostly works, but have been having issues with tables that contains a large amount of data: https://issues.apache.org/jira/browse/SPARK-6910 <https://issues.apache.org/jira/browse/SPARK-6910> > On May 27, 2015, at 20:52, Sanjay Subramanian > <sanjaysubraman...@yahoo.com.INVALID> wrote: > > hey guys > > On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , > there are about 300+ hive tables. > The data is stored an text (moving slowly to Parquet) on HDFS. > I want to use SparkSQL and point to the Hive metadata and be able to define > JOINS etc using a programming structure like this > > import org.apache.spark.sql.hive.HiveContext > val sqlContext = new HiveContext(sc) > val schemaRdd = sqlContext.sql("some complex SQL") > > > Is that the way to go ? Some guidance will be great. > > thanks > > sanjay > > >