luoyuxia opened a new issue, #2361: URL: https://github.com/apache/fluss/issues/2361
### Search before asking - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar. ### Description Currently, the logic of enable datalake is not strict enough - case1 1: user create a lake table in fluss, so table Fluss A, Paimon A is created 2: user disable datalake 3: user drop paimon A, and create paimon A again, but write some data into Paimon A directly 4: user alter to enable datalake, alter pass, but data is not conistent. To solve this case, in step 3, we can check the snapshot of table paimon A is consistent with Fluss A. in v2, we will record a file in `fluss-offsets` property, we can just parse the table id if from the file path. See `remoteLakeTableSnapshotDir`. Note we need to consider v1. in v1, we will store the full bukcet offsets from which we can still know the table id. - case2 1: user create a lake table in fluss, so table Fluss A, Paimon A is created 2: user disable datalake 3: data is ttl 4: user alter to enable datalake, but the data is ttl which cause the the tiering fail since the data is ttl and can't restore from the last tiered offset We can have a lossy check: if `current_timestamp` - `timestamp of latest snapshot` > `table ttl`, we can refuse to enable it since enable it will cause tiering fail. Note it may still cause tiering fail. But it should solve most case. ### Willingness to contribute - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
