You could store the jar in hdfs. Then even in yarn cluster mode your give workaround should work. Michael Shtelma <mshte...@gmail.com> schrieb am Fr. 12. Jan. 2018 um 12:58:
> Hi all, > > I would like to be able to compile Spark UDF at runtime. Right now I > am using Janino for that. > My problem is, that in order to make my compiled functions visible to > spark, I have to set janino classloader (janino gives me classloader > with compiled UDF classes) as context class loader before I create > Spark Session. This approach is working locally for debugging purposes > but is not going to work in cluster mode, because the UDF classes will > not be distributed to the worker nodes. > > An alternative is to register UDF via Hive functionality and generate > temporary jar somewhere, which at least in Standalone cluster mode > will be made available to spark workers using embedded http server. As > far as I understand, this is not going to work in yarn mode. > > I am wondering now, how is it better to approach this problem? My > current best idea is to develop own small netty based file web server > and use it in order to distribute my custom jar, which can be created > on the fly, to workers both in standalone and in yarn modes. Can I > reference the jar in form of http url using extra driver options and > then register UDFs contained in this jar using spark.udf().* methods? > > Does anybody have any better ideas? > Any assistance would be greatly appreciated! > > Thanks, > Michael > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >