Hi All,
I’d like to start a discussion on a draft SPIP
<https://docs.google.com/document/d/186cTAZxoXp1p8vaSunIaJmVLXcPR-FxSiLiDUl8kK8A/edit?tab=t.0#heading=h.for1fb3tezo3>
:
*SPIP: Catalog-backed Code-Literal Functions (SQL and Python) with Catalog
SPI and CRUD*
*Problem:* Spark can’t load SQL/Python function bodies from external
catalogs in a standard way today, so users rely on session registration or
vendor extensions.
*Proposal:*
-
Add CodeLiteralFunctionCatalog (Java SPI) returning CodeFunctionSpec
with implementations (spark-sql, python, python-pandas).
-
Resolution:
-
SQL: parse + inline (deterministic ⇒ foldable).
-
Python/pandas: run via existing Python UDF / pandas UDF runtime
(opaque).
-
SQL TVF: parse to plan, substitute params, validate schema.
-
DDL: CREATE/REPLACE/DROP FUNCTION delegates to the catalog if it
implements the SPI; otherwise fall back.
*Precedence + defaults:*
-
Unqualified: temp/session > built-in/DSv2 > code-literal (current
catalog). Qualified names resolve only in the named catalog.
-
Defaults: feature on, SQL on, Python/pandas off; optional
languagePreference.
Feedbacks are welcomed!
Thanks,
Huaxin