This is a great feature! How do we define the language runtime? e.g. the
Python version and libraries. Do we assume the Python runtime is the same
as the PySpark worker?

On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> wrote:

> Hi All,
>
> I’d like to start a discussion on a draft SPIP
> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3>
> :
>
> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with Catalog
> SPI and CRUD*
>
> *Problem:* Spark can’t load SQL/Python function bodies from external
> catalogs in a standard way today, so users rely on session registration or
> vendor extensions.
>
> *Proposal:*
>
>    -
>
>    Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec
>    with implementations (spark-sql, python, python-pandas).
>    -
>
>    Resolution:
>    -
>
>       SQL: parse + inline (deterministic ⇒ foldable).
>       -
>
>       Python/pandas: run via existing Python UDF / pandas UDF runtime
>       (opaque).
>       -
>
>       SQL TVF: parse to plan, substitute params, validate schema.
>       -
>
>    DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it
>    implements the SPI; otherwise fall back.
>
> *Precedence + defaults:*
>
>    -
>
>    Unqualified: temp/session > built-in/DSv2 > code-literal (current
>    catalog). Qualified names resolve only in the named catalog.
>    -
>
>    Defaults: feature on, SQL on, Python/pandas off; optional
>    languagePreference.
>
> Feedbacks are welcomed!
>
> Thanks,
>
> Huaxin
>

Reply via email to