Thanks! yes, this would be an option of course. HDFS or Alluxio. Sincerely, Michael Shtelma
On Fri, Jan 12, 2018 at 3:26 PM, Georg Heiler <georg.kf.hei...@gmail.com> wrote: > You could store the jar in hdfs. Then even in yarn cluster mode your give > workaround should work. > Michael Shtelma <mshte...@gmail.com> schrieb am Fr. 12. Jan. 2018 um 12:58: >> >> Hi all, >> >> I would like to be able to compile Spark UDF at runtime. Right now I >> am using Janino for that. >> My problem is, that in order to make my compiled functions visible to >> spark, I have to set janino classloader (janino gives me classloader >> with compiled UDF classes) as context class loader before I create >> Spark Session. This approach is working locally for debugging purposes >> but is not going to work in cluster mode, because the UDF classes will >> not be distributed to the worker nodes. >> >> An alternative is to register UDF via Hive functionality and generate >> temporary jar somewhere, which at least in Standalone cluster mode >> will be made available to spark workers using embedded http server. As >> far as I understand, this is not going to work in yarn mode. >> >> I am wondering now, how is it better to approach this problem? My >> current best idea is to develop own small netty based file web server >> and use it in order to distribute my custom jar, which can be created >> on the fly, to workers both in standalone and in yarn modes. Can I >> reference the jar in form of http url using extra driver options and >> then register UDFs contained in this jar using spark.udf().* methods? >> >> Does anybody have any better ideas? >> Any assistance would be greatly appreciated! >> >> Thanks, >> Michael >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org