Hello all,
I want to use Apex for executing R scripts wherein the parameters for
the script are coming in as tuples. In this regard, I have a few questions:
* I am presuming that the R dependencies are to be installed on all of
the hadoop nodes and the R script is to be put in the classpath ?
The R script will be referring to a few R libraries as part of its code.
* Is it fair to say that that the YARN container allocation does not
work exactly as the scriptoperator ( named as Rscript in malhar)
uses the REngine which is present locally as a binary ? Especially
if the R script itself uses parallelism in terms of its code etc. I
am asking this to plan out the resources required for such an
implementation.
* Is there a good documentation / pointer for best practices to be
followed when developing applications which use the ScriptOperator
equivalent constructs wherein there are external code constructs
that might be executed ?
Regards,
Ananth