kbendick commented on issue #3447:
URL: https://github.com/apache/iceberg/issues/3447#issuecomment-958179727


   A few things are possible that I can think of. First and foremost, you 
should check out the docs on:
   - table maintenance 
https://iceberg.apache.org/#maintenance/#table-maintenance 
   - streaming table maintenance: 
https://iceberg.apache.org/#spark-structured-streaming/#maintenance-for-streaming-tables
   
   If you're running the expire snapshots operation, keep in mind that there is 
the option for how many days worth of data do you want to retain. By default, 
the spark action to expire snapshots retains 5 days worth of snapshots, which 
would explain why no files are being removed.
   
   When you run the expire snapshots job, do you see any logs saying things are 
enabled?
   Some links that might help:
   - Javadoc for the action itself: 
https://iceberg.apache.org/#javadoc/0.12.0/org/apache/iceberg/ExpireSnapshots.html
   - Docs for the options from the Spark maintenance actions: 
https://iceberg.apache.org/#spark-procedures/#metadata-management
   
   I'd also make sure that your table doesn't have `gc.enabled = true` as a 
table property. If that's set to `false` (default should be `true`), then files 
won't be removed regardless.
   
   Additionally, it's possible that you have orphan files from commits that did 
not succeed and needed to be retried. You should also run the `remove orphan 
files` action: https://iceberg.apache.org/#maintenance/#remove-orphan-files
   
   How are you running the actions? With Java code or with Spark? I don't 
believe that Flink presently supports all of the maintenance procedures. Either 
way, given that it's only 10 hours worth of data, you'll definitely need to 
pass in the timestamp you want to expire older than.
   
   Try it out, ensuring that you pass in something for the `older_than` field 
of the Spark procedure (or the equivalent method if using the Java API). 
Something like `now() - INTERVAL '4 hours'` or `System.currentTimeMillis - 
(Duration.ofHours(4).getSeconds() * 1000)` or something.
   
   Let us know if that solves your problem, particularly passing in a maximum 
age of files to keep. The default is well over 10 hours in all cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to