> Which Python version will run that stored procedure?
>
> All Python versions supported in PySpark
>

Where in stored procedure defines the exact python version which will run
the code? That was the question.


> How to manage external dependencies?
>
> Existing way we have
> https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html
> .
> In fact, this will use the external dependencies within your Python
> interpreter so you can use all existing conda or venvs.
>
Current proposal solves this issue nohow (the stored code doesn't provide
any manifest about its dependencies and what is required to run it). So
feels like it's better to stay with UDF since they are under control and
their behaviour is predictable. Did I miss something?

How to test it via a common CI process?
>
> Existing way of PySpark unittests, see
> https://github.com/apache/spark/tree/master/python/pyspark/tests
>
Sorry, but this wouldn't work since stored procedure thing requires some
specific definition and this code will not be stored as regular python
code. Do you have any examples how to test stored python procedures as a
unit e.g. without spark?

How to manage versions and do upgrades? Migrations?
>
> This is a new feature so no migration is needed. We will keep the
> compatibility according to the sember we follow.
>
Question was not about spark, but about stored procedures itself. Any
guidelines which will not copy flaws of other systems?

Current Python UDF solution handles these problems in a good way since they
> delegate them to project level.
>
> Current UDF solution cannot handle stored procedures because UDF is on the
> worker side. This is Driver side.
>
How so? Currently it works and we never faced such issue. May be you should
have the same Python code also on the driver side? But such trivial idea
doesn't require new feature on Spark since you already have to ship that
code somehow.

--
,,,^..^,,,

Reply via email to