Alexey Kudinkin created HUDI-3456:
-------------------------------------
Summary: Revisit Properties/Config Defaults handling
Key: HUDI-3456
URL: https://issues.apache.org/jira/browse/HUDI-3456
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Right now, whenever we compose a configuration we essentially follow the
formula below:
We take user-input, add {+}defaults for missing properties{+}, and seal it as
complete set of configs.
The problem with this approach is that consumer of the configuration has no way
to tell whether the config has been User-provided or set from defaults. Such
shading creates quite some issues in places where consumer wants to know
whether User provided any input for particular property or not (right now it's
simply impossible).
Take PRECOMBINE_FIELD_NAME as an example: by default it falls back to "ts". But
PRECOMBINE_FIELD_NAME is not a _required_ configuration (since User might opt
in for custom payload merging) and such shading makes it impossible for ex for
Spark Relation to know whether this column was specified by User, and it has to
be present in the schema OR whether it's a default value (we assumed) and
there's no guarantee that it would be present.
This leads to some places actually over-correcting this behavior and injecting
empty strings "" as the means to suppress fallback to default (since null,
would be assumed as the condition to fallback)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)