alexeykudinkin commented on code in PR #6196:
URL: https://github.com/apache/hudi/pull/6196#discussion_r961984500
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java:
##########
@@ -38,7 +38,7 @@ public class HoodieCommonConfig extends HoodieConfig {
public static final ConfigProperty<Boolean> RECONCILE_SCHEMA = ConfigProperty
.key("hoodie.datasource.write.reconcile.schema")
- .defaultValue(false)
+ .defaultValue(true)
Review Comment:
Initially was not in favor of this change, but now thinking about it a
little more and especially in the light of
https://github.com/apache/hudi/pull/6358, i think this is the right thing to
do: for ex, after #6358, we'd be allowing to go writes, which might have
columns dropped in the new batch. Now, there are 2 scenarios based on whether
the reconciliation is enabled or not:
1. If reconciliation is _enabled_: we will be favoring table's schema and
use it as a _writer-schema_. So in that case we will rewrite the incoming batch
into the table's schema before applying it to the table.
2. If reconciliation is _disabled_: we will be favoring incoming batch's
schema and use it as a _writer-schema_. In this case, for ex, for COW, we will
be reading the table in its existing schema, but the new base files will be
written in the writer's schema (ie w/ the column dropped)
Both of these approaches are legitimate and could be preferred in different
circumstances. What's important here for us is to pick the right default
setting that would minimize the _surprise effect_.
Having reflected on this for some time now i think, that enabling
reconciliation by default makes more sense as it protects table's schema from
accidental mishaps in the incoming batches. And if somebody prefers the flow #2
the could easily opt-in for it by simply disabling the reconciliation.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]