aokolnychyi opened a new issue #1599:
URL: https://github.com/apache/iceberg/issues/1599
We can define the following stored procedure for removing orphan files.
```
CALL catalog.schema.remove_orphan_files(
namespace => 'namespace_name',
table => 'table_name',
older_than => timestamp,
ignore_scheme => true, -- optional
ignore_authority => true, -- optional
dry_run => true -- optional
)
```
The stored procedure should return a list of file locations that are
considered orphan.
It can be called with an interval:
```
CALL catalog.schema.remove_orphan_files(
namespace => 'namespace_name',
table => 'table_name',
older_than => CURRENT_TIMESTAMP - INTERVAL '5' DAYS,
dry_run => true
)
```
Or with a given timestamp:
```
CALL catalog.schema.remove_orphan_files(
table => 'schema.table_name',
older_than => TIMESTAMP '2020-01-19 03:14:07',
dry_run => false
)
```
**Note**: we must validate `older_than` is large enough. We should either
force it to be at least 1 day old or allow users to configure the min interval
through table properties.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]