hililiwei opened a new pull request, #877:
URL: https://github.com/apache/incubator-paimon/pull/877

   *(Please specify the module before the PR name: [core] ... or [flink] ...)*
   
   ### Purpose
   
   *(What is the purpose of the change, or the associated issue)*
   
   Refactor the snapshot recording method of Paimon, and add the TableMetadata 
object.
   
   Why do we need this change? 
   
   First of all, a linear history can enable us to achieve some advanced 
features, such as branching. Secondly, relying on the file system management 
mode is unfriendly, especially for object storage. We should try to avoid using 
List\Rename and other operations that are very expensive on object storage. 
Finally, this is conducive to our transaction management.
   
   ### Tests
   
   *(List UT and IT cases to verify this change)*
   
   ### API and Format 
   
   *(Does this change affect API or storage format)*
   
   ### Documentation
   
   Add a parentId field to the Snapshot object to record its ancestor 
relationship, and make the snapshot record a linear chain.
   
   Snapshot:
   
   
   | filed | type |      |
   | ------------------- | ---- | ---- |
   | parentId             |  Long    | The parent snapshot id     |
   
   
   TableMetadata :
   
   | filed               | type           |                                     
|
   | ------------------- | -------------- | ----------------------------------- 
|
   | version             | Integer        | the version of table metadata       
|
   | last-updated-ms     | Long           | Update time of the latest snapshot  
|
   | properties          | Map            |                                     
|
   | current-snapshot-id | Long           | the latest snapshot id              
|
   | snapshots           | List<Snapshot> | A collection of all valid snapshots 
|
   
   The new way will have a file named `METADATA_LATEST` , which records the 
latest snapshot id. The new snapshot will be saved in a file prefixed with 
`metadata-`, each file will contain all the valid snapshot sets.
   
   ```
   -rw-rw-r-- 1 dev dev 696  4月 11 14:44 metadata-1
   -rw-rw-r-- 1 dev dev   1  4月 11 14:44 METADATA_LATEST
   ```
   
   
   ##### How to parse the new version of the snapshot?
   
   Add `SnapshotManagerV2` to parse `TableMetadata` , which extend from 
`SnapshotManager` (this class has been refactored into an abstract class, and 
the original one is renamed to SnapshotManagerV1).
   
   Some key points: 
   
   1. If the table version is older, it will be upgraded to a new version table 
when `SnapshotManagerV2` is initialized. On this point, we can also upgrade it 
to a new version table on the **commit** action. 
   2.  There is a logic in `SnapshotManagerV2` to determine whether the table 
will be parsed by `SnapshotManagerV1` (old version table, no `TableMetadata`).
   
   ##### SnapshotManagerV2 and SnapshotManagerV1?
   
   Both SnapshotManagerV2 and SnapshotManagerV1 extend from `SnapshotManager`, 
and when SnapshotManagerV2 parses an older versions table, it will use 
SnapshotManagerV1's method instead.
   
   ##### Compatibility
   
   SnapshotManagerV2 itself cannot parse old tables, but I added a judgment 
logic in almost every public method of it. If this logic thinks that the target 
table should be parsed by V1, then it will use V1 instead.
   
   
   ##### When to upgrade the table?
   
   * upgrade the table when SnapshotManager is initialized;
   
   * upgrade it when submitting a new snapshot; 
   
   * provide a tool by Paimon to let developers manually upgrade.
   
   
   So there is a trade-off here. Because paimon is currently in the incubator 
and is still in the early stage, I think we don’t need to worry too much about 
compatibility.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to