luoyuxia opened a new issue, #2361:
URL: https://github.com/apache/fluss/issues/2361

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/fluss/issues) and 
found nothing similar.
   
   
   ### Description
   
   Currently, the logic of enable datalake is not strict enough
   
   -  case1 
   1:  user create a lake table in fluss, so table Fluss A, Paimon A is created
   2: user disable datalake
   3: user drop paimon A, and create paimon A again, but write some data into 
Paimon A directly
   4: user alter to enable datalake, alter pass, but data is not conistent.
   
   To solve this case, in step 3, we can check the snapshot of  table paimon A 
is consistent with  Fluss A.
   in v2, we will record a file in `fluss-offsets` property, we can just parse 
the table id if from  the file path. 
   See `remoteLakeTableSnapshotDir`. 
   
   Note we need to consider v1. in v1, we will store the full bukcet offsets 
from which we can still know the table id.
   
   
   - case2
   1: user create a lake table in fluss, so table Fluss A, Paimon A is created
   2: user disable datalake
   3: data is ttl
   4: user alter to enable datalake, but the data is ttl which cause the the 
tiering fail since the data is ttl and can't restore from the last tiered offset
   
   We can have a lossy  check: if `current_timestamp` - `timestamp of latest 
snapshot` > `table ttl`, we can refuse to enable it since enable it will cause 
tiering fail.
   Note it may still cause tiering fail. But it should solve most case.
   
   ### Willingness to contribute
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to