Re: Spark R - Loading Third Party R Library in YARN Executors

Felix Cheung Wed, 17 Aug 2016 04:17:48 -0700

When you call library(), that is the library loading function in native R. As 
of now it does not support HDFS but there are several packages out there that 
might help.

Another approach is to have a prefetch/installation mechanism to call HDFS 
command to download the R package from HDFS onto the worker node first.

_____________________________
From: Senthil Kumar <senthilec...@gmail.com<mailto:senthilec...@gmail.com>>
Sent: Wednesday, August 17, 2016 2:23 AM
Subject: Spark R - Loading Third Party R Library in YARN Executors
To: Senthil kumar <senthilec...@gmail.com<mailto:senthilec...@gmail.com>>, 
<du...@ebay.com<mailto:du...@ebay.com>>, 
<jiaj...@ebay.com<mailto:jiaj...@ebay.com>>, 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>

Hi All ,  We are using Spark 1.6 Version R library .. Below is our code which 
Loads the THIRD Party Library .

library("BreakoutDetection", lib.loc = "hdfs://xxxxxx/BreakoutDetection/") :
library("BreakoutDetection", lib.loc = "//xxxxxx/BreakoutDetection/") :

When i try to execute the code using LOCAL Mode , Spark R code is Working fine 
without any issue . If i submit the Job in Cluster , we will end up with error.

error in evaluating the argument 'X' in selecting a method for function 
'lapply': Error in library("BreakoutDetection", lib.loc = 
"hdfs://xxxxxxx/BreakoutDetection/") :
  no library trees found in 'lib.loc'
Calls: f ... lapply -> FUN -> mainProcess -> angleValid -> library

Can't we read libraries in R as below ?
library("BreakoutDetection", lib.loc = "hdfs://xxxxxx/BreakoutDetection/") :

If not what is the other way to solve this problem ?

Since our cluster having close to 2500 nodes we cant copy the Third Party Libs 
to all nodes .. Copying to all DNs is not good practice too ..

Can someone help me here How to load R libs from HDFS or any other way  ?

--Senthil

Re: Spark R - Loading Third Party R Library in YARN Executors

Reply via email to