nicochen opened a new issue, #3470:
URL: https://github.com/apache/amoro/issues/3470
### Description
We aware that most amoro-related properties should not have influence on
underlying formats when independently using them. This idea makes amoro more
flexible and pluggable. Thus, amoro catalog-level default properties are
designed and implemented to be merged on loading instead of written into table
properties directly. Whereas, the mode 'merge on loading' has its own drawbacks
in some case. For instance, it might has more than one logstore cluster in a
company. At very beginning, platform maintainers configure 'log-store.address'
as a catalog-level default key to indicate a default log-store cluster. Users
can directly create mix table without awaring log-store infrastructure infos
and everything is happy. However, after a few time passed the only one cluster
has bottleneck and thus platform needs a new additional log-store cluster.
Platform maintainer cannot simply configure the *'log-store.address'* again
because 'merge on loading' implementation would change log-store addr
ess for old mix tables. As we can seen, some special amoro properties should
been bind to tables(write into underly table properties) especially
storage-related properties(such as log-store\table compression codec).
Specifically, these factors are considered whether a property is
written(persist) into underlying table meta.
1. underlying format own configuration keys.
2. storage-related meta data keys (e.g log-store.xxx, compression.codec)
The properties with followed properties and prefix are not been written into
underlying table meta(**a blacklist**) by default, they act as 'merge on
loading' properties:
// amoro **service** related
- self-optimizing.
- optimize.
- table-expire.
- clean-orphan-file.
- clean-dangling-delete-files.
- data-expire.
- table-trash.
- tag.auto-create.
// read/write related
- read.split.open-file-cost
- read.split.planning-lookback
- read.split.target-size
- read.split.delete-ratio // removed, never be used in code
- write.upsert.enabled
// mix-hive related
- base.hive.auto-sync-schema-change
- base.hive.auto-sync-data-write
- base.hive.consistent-write.enabled
Furthermore, we also provided ways for users to configure **black list** and
**white list** for their own.
```java
table-properties.non-persisted.additional // a (semicolon-separated) list
of property names(or prefix) that would not write into('merge on loading')
underlying tables in addition to default names(or prefix)
```
```java
table-properties.non-persisted.excluded // a (semicolon-separated) list of
property names(or prefix) excluded from default 'merge on loading' properties,
they can been written into table properties
```
### Limitation
This feature is only valid when using spark\flink unified catalog
implementation to create table.
It is also only testified by iceberg\ mixed format.
### Use case/motivation
_No response_
### Describe the solution
Refer to description above, merge configed keys(list above) that should be
written into table metadata when creating a table in a unified catalog
implementation
### Subtasks
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]