jdye64 commented on issue #173:
URL: https://github.com/apache/arrow-ballista/issues/173#issuecomment-1387195133

   Yes, @adriangb is right. Much pain comes in trying to serialize and execute 
python code on remote nodes that have dependencies. This has been the case 
since even for Hive UDFs back years ago.
   
   The python ecosystem as a whole is one that relies heavily on existing 
dependencies. Therefore I think if we can come up with a straightforward method 
for ensuring all of the executors have a valid virtual environment with all the 
dependencies required by the UDF installed we should be good. This is the 
approach we take in some parts of Dask for example.
   
   So maybe as part of the Python UDF registration we require a "list" of 
dependencies that are required by the UDF. When the executor server starts up 
it could create that virtual env, through pip or conda or whatever is chosen, 
and installed those dependencies. Think of it like a executor server 
bootstrapping process. Then when any sql queries are submitted the UDF can be 
serialized and sent to the executor, once there the UDF can be executed in that 
virtual environment.
   
   Couple of thoughts
   - Maybe that information about Python dependencies could live in the 
"catalog" description space of flight_sql in ballista?
   - I think being able to run Python UDFs is a must, almost not even worth 
having Python UDF support if dependencies can't be used. This is just my 
opinion and not a fact.
   - I can remember old versions of Hive required the user to manually SSH to 
each node and manually install those python dependencies. It was the quickest 
route I ever discovered to making enemies in the dev ops teams =) . I think 
this path is a non starter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to