Thanks!
yes, this would be an option of course.
HDFS or Alluxio.
Sincerely,
Michael Shtelma


On Fri, Jan 12, 2018 at 3:26 PM, Georg Heiler <georg.kf.hei...@gmail.com> wrote:
> You could store the jar in hdfs. Then even in yarn cluster mode your give
> workaround should work.
> Michael Shtelma <mshte...@gmail.com> schrieb am Fr. 12. Jan. 2018 um 12:58:
>>
>> Hi all,
>>
>> I would like to be able to compile Spark UDF at runtime. Right now I
>> am using Janino for that.
>> My problem is, that in order to make my compiled functions visible to
>> spark, I have to set janino classloader (janino gives me classloader
>> with compiled UDF classes) as context class loader before I create
>> Spark Session. This approach is working locally for debugging purposes
>> but is not going to work in cluster mode, because the UDF classes will
>> not be distributed to the worker nodes.
>>
>> An alternative is to register UDF via Hive functionality and generate
>> temporary jar somewhere, which at least in Standalone cluster mode
>> will be made available to spark workers using embedded http server. As
>> far as I understand, this is not going to work in yarn mode.
>>
>> I am wondering now, how is it better to approach this problem? My
>> current best idea is to develop own small netty based file web server
>> and use it in order to distribute my custom jar, which can be created
>> on the fly, to workers both in standalone and in yarn modes. Can I
>> reference the jar in form  of http url using extra driver options and
>> then register UDFs contained in this jar using spark.udf().* methods?
>>
>> Does anybody have any better ideas?
>> Any assistance would be greatly appreciated!
>>
>> Thanks,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to