Re: sparkR ORC support.

Sandeep Khurana Wed, 06 Jan 2016 04:36:13 -0800

Felix

I tried the option suggested by you.  It gave below error.  I am going to
try the option suggested by Prem .


Error in writeJobj(con, object) : invalid jobj 1
8
stop("invalid jobj ", value$id)
7
writeJobj(con, object)
6
writeObject(con, a)
5
writeArgs(rc, args)
4
invokeJava(isStatic = TRUE, className, methodName, ...)
3
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
2
read.df(sqlContext, filepath, "orc") at
spark_api.R#108

On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <premsure...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sand...@infoworks.io>
> Cc: spark users <user@spark.apache.org>, Deepak Sharma <
> deepakmc...@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   
>>> unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Thanks Deepak.
>>>>>
>>>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>>>> sparkRHive.init(sc) "  .
>>>>>
>>>>> When I tried to read hive table from this ,
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>
>>>>> I get below error,
>>>>>
>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If 
>>>>> SparkR was restarted, Spark operations need to be re-executed.
>>>>>
>>>>>
>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sandeep
>>>>>> I am not sure if ORC can be read directly in R.
>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>> files and then access hive table in R.
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io
>>>>>> > wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>> to find a package to do that.
>>>>>>>
>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>
>>>>>>> --
>>>>>>> Architect
>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>> http://Infoworks.io
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Deepak
>>>>>> www.bigdatabig.com
>>>>>> www.keosha.net
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Architect
>>>>> Infoworks.io <http://infoworks.io>
>>>>> http://Infoworks.io
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>
>
>
>


-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Reply via email to