Hi Wenchen, Great question. In the SPIP, the language runtime is carried in the function spec (for python / python-pandas) so catalogs can optionally declare constraints on the execution environment.
Concretely, the spec can include optional fields like: - pythonVersion (e.g., "3.10") - requirements (pip-style specs) - environmentUri (optional pointer to a pre-built / admin-approved environment) For the initial stage, we assume execution uses the existing PySpark worker environment (same as regular Python UDF / pandas UDF). If pythonVersion / requirements are present, Spark can validate them against the current worker env and fail fast (AnalysisException) if they’re not satisfied. environmentUri is intended as an extension point for future integration (or vendor plugins) to select a vetted environment, but we don’t assume Spark will provision environments out-of-the-box in v1. Thanks, Huaxin On Wed, Jan 7, 2026 at 6:06 PM Wenchen Fan <[email protected]> wrote: > This is a great feature! How do we define the language runtime? e.g. the > Python version and libraries. Do we assume the Python runtime is the same > as the PySpark worker? > > On Thu, Jan 8, 2026 at 3:12 AM huaxin gao <[email protected]> wrote: > >> Hi All, >> >> I’d like to start a discussion on a draft SPIP >> <https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3> >> : >> >> *SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with >> Catalog SPI and CRUD* >> >> *Problem:* Spark can’t load SQL/Python function bodies from external >> catalogs in a standard way today, so users rely on session registration or >> vendor extensions. >> >> *Proposal:* >> >> - >> >> Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec >> with implementations (spark-sql, python, python-pandas). >> - >> >> Resolution: >> - >> >> SQL: parse + inline (deterministic ⇒ foldable). >> - >> >> Python/pandas: run via existing Python UDF / pandas UDF runtime >> (opaque). >> - >> >> SQL TVF: parse to plan, substitute params, validate schema. >> - >> >> DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it >> implements the SPI; otherwise fall back. >> >> *Precedence + defaults:* >> >> - >> >> Unqualified: temp/session > built-in/DSv2 > code-literal (current >> catalog). Qualified names resolve only in the named catalog. >> - >> >> Defaults: feature on, SQL on, Python/pandas off; optional >> languagePreference. >> >> Feedbacks are welcomed! >> >> Thanks, >> >> Huaxin >> >
