nicochen opened a new issue, #3470:
URL: https://github.com/apache/amoro/issues/3470

   ### Description
   
   We aware that most amoro-related properties should not have influence on 
underlying formats when independently using them. This idea makes amoro more 
flexible and pluggable. Thus, amoro catalog-level default properties are 
designed and implemented to be merged on loading instead of written into table 
properties directly. Whereas, the mode 'merge on loading' has its own drawbacks 
in some case. For instance, it might has more than one logstore cluster in a 
company. At very beginning, platform maintainers configure 'log-store.address'  
as a catalog-level default key to indicate a default log-store cluster. Users 
can directly create mix table without awaring log-store infrastructure infos 
and everything is happy. However, after a few time passed the only one cluster 
has bottleneck and thus platform needs a new additional log-store cluster. 
Platform maintainer cannot simply configure the  *'log-store.address'*  again 
because 'merge on loading' implementation would change log-store addr
 ess for old mix tables. As we can seen, some special amoro properties should 
been bind to tables(write into underly table properties) especially 
storage-related properties(such as log-store\table compression codec).
   
   Specifically, these factors are considered whether a property is 
written(persist) into underlying table meta. 
   
   1. underlying format own configuration keys.
   2. storage-related meta data keys (e.g log-store.xxx, compression.codec)
   
   The properties with followed properties and prefix are not been written into 
underlying table meta(**a blacklist**)  by default, they act as 'merge on 
loading' properties:
   
   // amoro **service** related
   
   - self-optimizing.
   - optimize.
   - table-expire.
   - clean-orphan-file.
   - clean-dangling-delete-files.
   - data-expire.
   - table-trash.
   - tag.auto-create.
   
   // read/write related
   
   - read.split.open-file-cost
   - read.split.planning-lookback
   - read.split.target-size
   - read.split.delete-ratio  // removed, never be used in code
   - write.upsert.enabled
   
   // mix-hive related
   
   - base.hive.auto-sync-schema-change
   - base.hive.auto-sync-data-write
   - base.hive.consistent-write.enabled
   
     
   
   Furthermore, we also provided ways for users to configure **black list** and 
**white list** for their own. 
   
   ```java
   table-properties.non-persisted.additional  // a (semicolon-separated) list 
of property names(or prefix) that would not write into('merge on loading') 
underlying tables in addition to default names(or prefix)
   ```
   
   ```java
   table-properties.non-persisted.excluded // a (semicolon-separated) list of 
property names(or prefix) excluded from default 'merge on loading' properties, 
they can been written into table properties 
   ```
   ### Limitation
   This feature is only valid when using spark\flink unified catalog 
implementation to create table. 
   It is also only testified by iceberg\ mixed format.
   
   ### Use case/motivation
   
   _No response_
   
   ### Describe the solution
   
   Refer to description above, merge configed keys(list above) that should be 
written into table metadata when creating a table in a unified catalog 
implementation
   
   ### Subtasks
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to