nooneuse opened a new pull request, #64621:
URL: https://github.com/apache/doris/pull/64621
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
**Problem Summary**:
- Previously, column DEFAULT values in Doris were primarily treated as
literal constants (or a small set of special built-ins), which limited
usability for common patterns like “derive default from current date/time” or
“compose a formatted default string”.
- This PR introduces expression-based column default values. Users can
define a column default using an expression composed of:
- deterministic built-in functions, and
- a limited set of allowed non-deterministic date/time functions (e.g.
now/current_timestamp/current_date) to support time-dependent defaults.
- The PR also clarifies and enforces usage constraints to keep default
expressions safe and analyzable, and adds a compatibility guard for MoW
unique-key partial update to avoid unsafe default-filling behavior.
**How to use**
Example (date/datetime defaults):
`d DATEV2 NOT NULL DEFAULT to_date(now())`
`dt DATETIMEV2(3) NOT NULL DEFAULT now(3)`
Example (composed string default):
`s STRING NOT NULL DEFAULT concat('a-', cast(to_date(now()) as string))`
**Main limitations / constraints**
The default expression must be analyzable and side-effect free:
- No column references.
- No subqueries.
- No aggregate functions.
- No window functions.
- No UDFs.
Non-deterministic functions are restricted:
- Only allow time-related functions such as
now/current_timestamp/current_date in default expressions.
- Other non-deterministic functions (e.g. rand()) are rejected.
Type must be compatible:
- The expression result is cast/coerced to the target column type during
analysis.
**Default value behavior in different scenarios**
INSERT without specifying the column:
- The column value is generated from the default expression at write time.
- For time-dependent expressions, the evaluated value depends on the
statement execution time.
INSERT explicitly providing a value:
- The provided value is used; the default expression is not applied.
MoW unique-key partial update (INSERT-triggered partial update, missing
non-key columns):
- If the table contains expression-default columns, partial update may
require filling missing columns using schema defaults.
- To avoid unsupported/unsafe behavior, this PR rejects such partial updates
by default with a clear error message.
- Users can explicitly allow it via session variable
allow_partial_update_with_expression_default=true if they understand the
behavior and accept the risks/constraints (see Behavior Details).
> **Behavior Details**
> 1) Write-time missing columns (DML)
>
> **Covered behaviors**
>
> - INSERT/INSERT INTO ... SELECT ... where some target columns are omitted.
> - File-based ingestion (e.g. load/scan) where input data does not provide
all destination columns.
> - MoW unique-key partial update paths (INSERT-triggered partial update,
and load jobs with
unique_key_update_mode=UPDATE_FIXED_COLUMNS/UPDATE_FLEXIBLE_COLUMNS) where
non-specified columns are treated as “missing” and need to be filled.
>
> **What this PR does**
>
> - Adds support for expression-based column default values, allowing
defaults defined as expressions composed of:
> - deterministic built-in functions, and
> - a limited set of allowed non-deterministic date/time functions (e.g.
now/current_timestamp/current_date).
> - For normal DML inserts and ingestion planning, missing columns can use
the column’s default SQL expression (getDefaultValueSql()), so time-dependent
defaults behave as expected at write time.
> - For MoW unique-key partial update, if the table contains
expression-default columns, this PR rejects the operation by default and
provides an explicit session switch to allow it:
> - allow_partial_update_with_expression_default=true
>
> **Why**
>
> - Partial update may require filling missing columns using schema
defaults. For expression defaults, the semantics (especially with
non-deterministic time functions) can be ambiguous and may differ from
“evaluate at write time per row”.
> - The default guard avoids silent incorrect results and forces users to
opt in only when they understand the implications.
>
> 2) Reading old data with missing columns (schema evolution read)
>
> **Covered behaviors**
>
> - Scanning/querying old rowsets/segments produced before a schema change
(e.g. light schema change), where newly added columns do not physically exist
in the old data files.
> - Any scan path that materializes missing columns using a default-value
iterator during read.
>
> **What this PR does**
>
> - Keeps the existing schema-evolution read behavior: missing columns are
filled by underlying literal defaults.
> - For expression-default columns, this PR stores an additional folded
literal (realDefaultValue) computed at DDL/analyze time. Read-time filling uses
realDefaultValue (a literal), not the original expression SQL.
>
> **Why**
>
> - The BE read path for missing columns expects a literal string that can
be parsed into a field value; it does not execute arbitrary expressions during
scan.
> - Using realDefaultValue preserves compatibility and avoids introducing an
expression execution dependency into read paths.
>
> 3) Point query / row store missing-column fill (read path)
>
> **Covered behaviors**
>
> Point query / rowid fetch / row-store related read paths that may need to
materialize a full row and fill columns that are missing in the underlying
storage representation.
>
> **What this PR does**
>
> Uses the same approach as schema-evolution scans: for expression-default
columns, read paths rely on the pre-computed literal realDefaultValue (exported
as column default string in descriptors) to fill missing columns.
>
> **Why**
>
> These read paths are latency sensitive and are not designed to evaluate
expression defaults at read time.
> Aligning point query / row store fill with the segment scan behavior
ensures consistent semantics and implementation simplicity.
### Release note
- Doris now supports expression-based column default values with strict
validation rules.
- INSERT can omit such columns and the default expression will be evaluated
at write time.
- For MoW unique-key tables, INSERT-triggered partial update is rejected by
default when expression-default columns exist; it can be enabled via
allow_partial_update_with_expression_default.
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Doris now supports expression-based column default values with
strict validation rules.
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]