laskoviymishka commented on code in PR #1104:
URL: https://github.com/apache/iceberg-go/pull/1104#discussion_r3277040701
##########
catalog/sql/sql_test.go:
##########
@@ -684,6 +684,39 @@ func (s *SqliteCatalogTestSuite) TestDropTableNotExist() {
}
}
+func (s *SqliteCatalogTestSuite) TestPurgeTable() {
+ tests := []struct {
+ cat *sqlcat.Catalog
+ tblID table.Identifier
+ }{
+ {s.getCatalogMemory(), s.randomTableIdentifier()},
+ {s.getCatalogSqlite(), s.randomHierarchicalIdentifier()},
+ }
+
+ for _, tt := range tests {
+ ns := catalog.NamespaceFromIdent(tt.tblID)
+
s.Require().NoError(tt.cat.CreateNamespace(context.Background(), ns, nil))
+ tbl, err := tt.cat.CreateTable(context.Background(), tt.tblID,
tableSchemaNested)
+ s.Require().NoError(err)
+
+ metaFile := strings.TrimPrefix(tbl.MetadataLocation(),
"file://")
+ s.FileExists(metaFile)
+
+ // Assert that the catalog implements PurgeableTable
+ purger, ok := any(tt.cat).(catalog.PurgeableTable)
+ s.Require().True(ok, "catalog must implement PurgeableTable")
+
+ s.NoError(purger.PurgeTable(context.Background(), tt.tblID))
+
+ // The table catalog entry should be gone
+ _, err = tt.cat.LoadTable(context.Background(), tt.tblID)
+ s.ErrorIs(err, catalog.ErrNoSuchTable)
+
+ // The physical file should be gone
+ s.NoFileExists(metaFile)
Review Comment:
Only checks the single metadata.json is gone. `CreateTable` against an empty
table writes no data files, so the walk + bulk-delete path is never exercised.
The test would still pass if `PurgeTableFiles` only removed metadata.json and
silently failed on everything else.
I'd extend with an `AppendTable` commit so there are real data + manifest
files, then assert the directory is empty.
##########
catalog/internal/utils.go:
##########
@@ -241,3 +242,37 @@ func UpdateAndStageTable(ctx context.Context, current
*table.Table, ident table.
),
}, nil
}
+
+// PurgeTableFiles physically deletes all files under the table's warehouse
location using WalkDir.
+// It is best-effort — if interrupted, orphaned files may remain.
+func PurgeTableFiles(ctx context.Context, tbl *table.Table) (err error) {
Review Comment:
Walks `Metadata().Location()`, but tables can write outside that root via
`write.data.path` / `write.metadata.path` (table/properties.go:27-28).
PyIceberg/Java walk manifest entries instead, which catches that case.
At minimum document the limitation on `PurgeableTable`. Better, union in
manifest-referenced paths that fall outside `Location()`. wdyt?
##########
catalog/glue/glue.go:
##########
@@ -405,6 +405,28 @@ func (c *Catalog) DropTable(ctx context.Context,
identifier table.Identifier) er
return nil
}
+func (c *Catalog) PurgeTable(ctx context.Context, identifier table.Identifier)
error {
+ tbl, err := c.LoadTable(ctx, identifier)
+ if err != nil {
+ if errors.Is(err, catalog.ErrNoSuchTable) {
+ return err
+ }
Review Comment:
This (same in hive.go, sql.go) silently downgrades to metadata-only on any
non-`ErrNoSuchTable` `LoadTable` failure: network blip, S3 403, corrupt
metadata JSON. User asked for `--purge`, catalog entry vanishes, files stay, no
signal.
I'd return the error. If the user wants metadata-only on load failure they
can re-run `drop table` without the flag. At minimum a typed
`ErrPurgeIncomplete` the CLI surfaces, not a silent fallback.
##########
catalog/glue/glue_test.go:
##########
@@ -1447,3 +1447,28 @@ func TestIsConcurrentModificationException(t *testing.T)
{
require.False(t, isConcurrentModificationException(errors.New("network
timeout")))
require.False(t, isConcurrentModificationException(nil))
}
+
Review Comment:
No `FS` plumbed in, so `PurgeTableFiles` never gets exercised. The test only
asserts `LoadTable` + `DropTable` get called, which `DropTable` tests already
cover. Same shape in hive_test.go:1207.
Either drop these or thread an in-memory FS so the purge path is real. As
written they'd still pass if we deleted `PurgeTableFiles` entirely.
##########
catalog/catalog.go:
##########
@@ -167,6 +167,12 @@ type TransactionalCatalog interface {
CommitTransaction(ctx context.Context, commits []table.TableCommit)
error
}
+// PurgeableTable is an optional interface that catalogs can implement
Review Comment:
Godoc could carry the contract callers actually need: purge is best-effort,
file-deletion errors may or may not surface, and REST delegates server-side
while SQL/Glue/Hive/Hadoop are client-side and may miss files outside the
location root. Today a downstream implementor has no signal about what
'physical table deletion' guarantees.
##########
catalog/internal/utils.go:
##########
@@ -241,3 +242,37 @@ func UpdateAndStageTable(ctx context.Context, current
*table.Table, ident table.
),
}, nil
}
+
+// PurgeTableFiles physically deletes all files under the table's warehouse
location using WalkDir.
+// It is best-effort — if interrupted, orphaned files may remain.
+func PurgeTableFiles(ctx context.Context, tbl *table.Table) (err error) {
+ fs, err := tbl.FS(ctx)
+ if err != nil {
+ return err
+ }
+ location := tbl.Metadata().Location()
+
+ var files []string
+ if listable, ok := fs.(icebergio.ListableIO); ok {
+ _ = listable.WalkDir(location, func(path string, d
stdfs.DirEntry, err error) error {
+ if err != nil {
+ return nil
+ }
+ if !d.IsDir() {
+ files = append(files, path)
+ }
+
+ return nil
+ })
+ }
Review Comment:
Every deletion error is silently dropped: the `WalkDir` err, the
`DeleteFiles` err, every individual `Remove`. Function declares `(err error)`
but always returns nil, so callers can't tell a no-op purge from success.
I'd collect with `errors.Join` and return. `BulkRemovableIO` also returns
the successfully-deleted list, worth surfacing when `len(deleted) < len(files)`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]