RE: sparkR ORC support.

Felix Cheung Tue, 12 Jan 2016 01:05:39 -0800

It looks like you have overwritten sc. Could you try this:
 
 
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), 
.libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, 
"/data/ingest/sparktest1/", "orc")

Date: Tue, 12 Jan 2016 14:28:58 +0530
Subject: Re: sparkR ORC support.
From: sand...@infoworks.io
To: felixcheun...@hotmail.com
CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com; 
deepakmc...@gmail.com

The code is very simple, pasted below .  hive-site.xml is in spark conf 
already. I still see this error Error in writeJobj(con, object) : invalid jobj 3
after running the script  below

script=======Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), 
.libPaths()))library(SparkR)
sc <<- sparkR.init()sc <<- sparkRHive.init()hivecontext <<- 
sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", 
"orc")#View(df)

On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com> wrote:

Yes, as Yanbo suggested, it looks like there is something wrong with the 
sqlContext.

Could you forward us your code please?

On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" 
<yblia...@gmail.com> wrote:

You should ensure your sqlContext is HiveContext.

sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)

2016-01-06 20:35 GMT+08:00 Sandeep Khurana 
<sand...@infoworks.io>:

Felix

I tried the option suggested by you.  It gave below error.  I am going to try 
the option suggested by Prem .

Error in writeJobj(con, object) : invalid jobj 1

8

stop("invalid jobj ", value$id)

7

writeJobj(con, object)

6

writeObject(con, a)

5

writeArgs(rc, args)

4

invokeJava(isStatic = TRUE, className, methodName, ...)

3

callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, 
source, options)

2

read.df(sqlContext, filepath, "orc") at
spark_api.R#108

On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung 
<felixcheun...@hotmail.com> wrote:

Firstly I don't have ORC data to verify but this should work:

df <- loadDF(sqlContext, "data/path", "orc")

Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should 
be called after sparkR.init() - please check if there is any error message 
there.

_____________________________

From: Prem Sure <premsure...@gmail.com>

Sent: Tuesday, January 5, 2016 8:12 AM

Subject: Re: sparkR ORC support.

To: Sandeep Khurana <sand...@infoworks.io>

Cc: spark users <user@spark.apache.org>, Deepak Sharma <deepakmc...@gmail.com>

Yes Sandeep, also copy hive-site.xml too to spark conf directory. 

On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana 
<sand...@infoworks.io> wrote:

Also, do I need to setup hive in spark as per the link  
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?

We might need to copy hdfs-site.xml file to spark conf directory ? 

On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana 
<sand...@infoworks.io> wrote:

Deepak

Tried this. Getting this error now 

rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused 
argument ("")

On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma 
<deepakmc...@gmail.com> wrote:

Hi Sandeep 
can you try this ? 

results <- sql(hivecontext, "FROM test SELECT id","") 

Thanks 

Deepak 

On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana 
<sand...@infoworks.io> wrote:

Thanks Deepak.

I tried this as well. I created a hivecontext   with  "hivecontext <<- 
sparkRHive.init(sc) "  .

When I tried to read hive table from this ,  

results <- sql(hivecontext, "FROM test SELECT id") 

I get below error,  

Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR 
was restarted, Spark operations need to be re-executed.

Not sure what is causing this? Any leads or ideas? I am using rstudio. 

On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma 
<deepakmc...@gmail.com> wrote:

Hi Sandeep 
I am not sure if ORC can be read directly in R. 
But there can be a workaround .First create hive table on top of ORC files and 
then access hive table in R.

Thanks 
Deepak 

On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana 
<sand...@infoworks.io> wrote:

Hello

I need to read an ORC files in hdfs in R using spark. I am not able to find a 
package to do that. 

Can anyone help with documentation or example for this purpose? 

-- 

Architect 

Infoworks.io 

http://Infoworks.io 

-- 

Thanks 

Deepak 

www.bigdatabig.com 

www.keosha.net 

-- 

Architect 

Infoworks.io 

http://Infoworks.io 

-- 

Thanks 

Deepak 

www.bigdatabig.com 

www.keosha.net 

-- 

Architect 

Infoworks.io 

http://Infoworks.io 

-- 

Architect 

Infoworks.io 

http://Infoworks.io 

-- 

Architect

Infoworks.io

http://Infoworks.io

-- 
Architect
Infoworks.io
http://Infoworks.io

RE: sparkR ORC support.

Reply via email to