adriangb opened a new pull request, #21509:
URL: https://github.com/apache/datafusion/pull/21509
## Which issue does this PR close?
<!-- We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases. You can
link an issue to this PR using the GitHub syntax. For example `Closes #123`
indicates that this PR will close issue #123. -->
- N/A (small additive enhancement)
## Rationale for this change
DataFusion already exposes `arrow_metadata(expr[, key])` for **reading**
Arrow field metadata, but has no way to **attach** metadata to a column from
SQL or the `Expr` DSL. Arrow field metadata is useful for propagating
annotations (units, semantic types, provenance, downstream hints) through a
query plan without materializing an extra value column.
This PR adds `with_metadata`, the symmetric counterpart to `arrow_metadata`.
## What changes are included in this PR?
A new core scalar UDF `with_metadata(expr, 'k1', 'v1'[, 'k2', 'v2', ...])`:
- **Value semantics:** pure pass-through of the first argument.
- **Schema semantics:** returns a `FieldRef` whose metadata is the input
field's metadata merged with the supplied key/value pairs; new keys overwrite
on collision. Input field **name**, **data type**, and **nullability** are
preserved, so `with_metadata(col, ...)` behaves as a transparent annotation.
- **Syntax:** variadic key/value literal pairs, modelled after
`named_struct`. Chosen over a list-of-pairs form because SQL lacks a tuple
literal and programmatic callers can simply splat an alternating `Vec<Expr>` of
literals.
- **Validation:** at planning time in `return_field_from_args`. Requires an
odd arg count ≥ 3; each key must be a non-empty constant string; each value
must be a constant string.
Example usage:
```sql
-- attach one key
select arrow_metadata(with_metadata(id, 'unit', 'ms'), 'unit') from t;
-- ms
-- attach several and read the full map
select arrow_metadata(with_metadata(id, 'unit', 'ms', 'source', 'sensor'))
from t;
-- {metadata_key: the id field, source: sensor, unit: ms}
-- nesting composes; outer keys win on collision
select arrow_metadata(with_metadata(with_metadata(id, 'a', '1'), 'b', '2'))
from t;
```
Files touched:
- `datafusion/functions/src/core/with_metadata.rs` (new) — UDF impl + unit
tests
- `datafusion/functions/src/core/mod.rs` — registration in `functions()`,
`make_udf_function!`, and `expr_fn`
- `datafusion/sqllogictest/test_files/metadata.slt` — SQL-level coverage
(merge, overwrite, nesting, pass-through, error cases)
- `docs/source/user-guide/sql/scalar_functions.md` — regenerated via
`dev/update_function_docs.sh`
## Are these changes tested?
Yes:
- **Unit tests** (`datafusion/functions/src/core/with_metadata.rs`) covering
single-key attach, merge-with-overwrite on collision, multi-pair attach,
even-arity rejection, too-few-args rejection, and non-literal-key rejection.
- **SQL logic tests** (`metadata.slt`) covering attach/read roundtrip,
merging with pre-existing field metadata, collision overwrite, nested
`with_metadata(with_metadata(...))`, value pass-through, and planning-time
errors (odd arity, missing args, non-literal key, empty key).
- `cargo fmt --all` clean; `cargo clippy -p datafusion-functions
--all-targets --all-features -- -D warnings` clean (the `mutable_key_type`
error surfaced by `--all-targets --all-features` on the full workspace is
pre-existing on `main` and unrelated to this PR).
## Are there any user-facing changes?
Yes — a new built-in scalar function `with_metadata` is now available in SQL
and via `datafusion_functions::expr_fn::with_metadata`. Generated docs are
updated accordingly. No existing behavior changes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]