Alexey Kudinkin created HUDI-3456:
-------------------------------------

             Summary: Revisit Properties/Config Defaults handling
                 Key: HUDI-3456
                 URL: https://issues.apache.org/jira/browse/HUDI-3456
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin


Right now, whenever we compose a configuration we essentially follow the 
formula below:

We take user-input, add {+}defaults for missing properties{+}, and seal it as 
complete set of configs.

 

The problem with this approach is that consumer of the configuration has no way 
to tell whether the config has been User-provided or set from defaults. Such 
shading creates quite some issues in places where consumer wants to know 
whether User provided any input for particular property or not (right now it's 
simply impossible).

 

Take PRECOMBINE_FIELD_NAME as an example: by default it falls back to "ts". But 
PRECOMBINE_FIELD_NAME is not a _required_ configuration (since User might opt 
in for custom payload merging) and such shading makes it impossible for ex for 
Spark Relation to know whether this column was specified by User, and it has to 
be present in the schema OR whether it's a default value (we assumed) and 
there's no guarantee that it would be present.

 

This leads to some places actually over-correcting this behavior and injecting 
empty strings "" as the means to suppress fallback to default (since null, 
would be assumed as the condition to fallback)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to