hililiwei opened a new pull request, #877: URL: https://github.com/apache/incubator-paimon/pull/877
*(Please specify the module before the PR name: [core] ... or [flink] ...)* ### Purpose *(What is the purpose of the change, or the associated issue)* Refactor the snapshot recording method of Paimon, and add the TableMetadata object. Why do we need this change? First of all, a linear history can enable us to achieve some advanced features, such as branching. Secondly, relying on the file system management mode is unfriendly, especially for object storage. We should try to avoid using List\Rename and other operations that are very expensive on object storage. Finally, this is conducive to our transaction management. ### Tests *(List UT and IT cases to verify this change)* ### API and Format *(Does this change affect API or storage format)* ### Documentation Add a parentId field to the Snapshot object to record its ancestor relationship, and make the snapshot record a linear chain. Snapshot: | filed | type | | | ------------------- | ---- | ---- | | parentId | Long | The parent snapshot id | TableMetadata : | filed | type | | | ------------------- | -------------- | ----------------------------------- | | version | Integer | the version of table metadata | | last-updated-ms | Long | Update time of the latest snapshot | | properties | Map | | | current-snapshot-id | Long | the latest snapshot id | | snapshots | List<Snapshot> | A collection of all valid snapshots | The new way will have a file named `METADATA_LATEST` , which records the latest snapshot id. The new snapshot will be saved in a file prefixed with `metadata-`, each file will contain all the valid snapshot sets. ``` -rw-rw-r-- 1 dev dev 696 4月 11 14:44 metadata-1 -rw-rw-r-- 1 dev dev 1 4月 11 14:44 METADATA_LATEST ``` ##### How to parse the new version of the snapshot? Add `SnapshotManagerV2` to parse `TableMetadata` , which extend from `SnapshotManager` (this class has been refactored into an abstract class, and the original one is renamed to SnapshotManagerV1). Some key points: 1. If the table version is older, it will be upgraded to a new version table when `SnapshotManagerV2` is initialized. On this point, we can also upgrade it to a new version table on the **commit** action. 2. There is a logic in `SnapshotManagerV2` to determine whether the table will be parsed by `SnapshotManagerV1` (old version table, no `TableMetadata`). ##### SnapshotManagerV2 and SnapshotManagerV1? Both SnapshotManagerV2 and SnapshotManagerV1 extend from `SnapshotManager`, and when SnapshotManagerV2 parses an older versions table, it will use SnapshotManagerV1's method instead. ##### Compatibility SnapshotManagerV2 itself cannot parse old tables, but I added a judgment logic in almost every public method of it. If this logic thinks that the target table should be parsed by V1, then it will use V1 instead. ##### When to upgrade the table? * upgrade the table when SnapshotManager is initialized; * upgrade it when submitting a new snapshot; * provide a tool by Paimon to let developers manually upgrade. So there is a trade-off here. Because paimon is currently in the incubator and is still in the early stage, I think we don’t need to worry too much about compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
