It sort of depends on the definition of efficiently.  From a work flow 
perspective I would agree but from an I/O perspective, wouldn’t there be the 
same multi-pass from the standpoint of the Hive context needing to push the 
data into HDFS?  Saying this, if you’re pushing the data into HDFS and then 
creating Hive tables via load (vs. a reference point ala external tables), I 
would agree with you.  

And thanks for correcting me, the registerTempTable is in the SqlContext.


On September 10, 2014 at 13:47:24, Du Li (l...@yahoo-inc.com) wrote:

Hi Denny,  

There is a related question by the way.  

I have a program that reads in a stream of RDD¹s, each of which is to be  
loaded into a hive table as one partition. Currently I do this by first  
writing the RDD¹s to HDFS and then loading them to hive, which requires  
multiple passes of HDFS I/O and serialization/deserialization.  

I wonder if it is possible to do it more efficiently with Spark 1.1  
streaming + SQL, e.g., by registering the RDDs into a hive context so that  
the data is loaded directly into the hive table in cache and meanwhile  
visible to jdbc/odbc clients. In the spark source code, the method  
registerTempTable you mentioned works on SqlContext instead of HiveContext.  

Thanks,  
Du  



On 9/10/14, 1:21 PM, "Denny Lee" <denny.g....@gmail.com> wrote:  

>Actually, when registering the table, it is only available within the sc  
>context you are running it in. For Spark 1.1, the method name is changed  
>to RegisterAsTempTable to better reflect that.  
>  
>The Thrift server process runs under a different process meaning that it  
>cannot see any of the tables generated within the sc context. You would  
>need to save the sc table into Hive and then the Thrift process would be  
>able to see them.  
>  
>HTH!  
>  
>> On Sep 10, 2014, at 13:08, alexandria1101  
>><alexandria.shea...@gmail.com> wrote:  
>>  
>> I used the hiveContext to register the tables and the tables are still  
>>not  
>> being found by the thrift server. Do I have to pass the hiveContext to  
>>JDBC  
>> somehow?  
>>  
>>  
>>  
>> --  
>> View this message in context:  
>>http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using  
>>-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html  
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.  
>>  
>> ---------------------------------------------------------------------  
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org  
>> For additional commands, e-mail: user-h...@spark.apache.org  
>>  
>  
>---------------------------------------------------------------------  
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org  
>For additional commands, e-mail: user-h...@spark.apache.org  
>  

Reply via email to