laskoviymishka commented on code in PR #1185:
URL: https://github.com/apache/iceberg-go/pull/1185#discussion_r3417888615


##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_ 
context.Context, ns table.Identifier
 func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _ 
table.Identifier, _ []string, _ iceberg.Properties) 
(catalog.PropertiesUpdateSummary, error) {
        return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog: 
UpdateNamespaceProperties not yet implemented")
 }
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the 
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {

Review Comment:
   I'd unexport this to `checkedMkdirAll`. The only caller is `CreateNamespace` 
in this same package, it bypasses `validateIdentifier`, and exporting it bakes 
an internal implementation detail into the stable public API right as we're 
trying to slim the surface.



##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_ 
context.Context, ns table.Identifier
 func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _ 
table.Identifier, _ []string, _ iceberg.Properties) 
(catalog.PropertiesUpdateSummary, error) {
        return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog: 
UpdateNamespaceProperties not yet implemented")
 }
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the 
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {
+       path := c.namespaceToPath(id)
+       for pathIndex := range id {

Review Comment:
   I think there's a real bug in this loop. `pathIndex` starts at 0, so the 
first iteration uses `id[:0]` — the empty identifier — and stats the warehouse 
root rather than the first namespace component. For a single-component 
namespace like `["ns"]` we never verify any actual parent, and we redundantly 
stat the warehouse on every call.
   
   I'd start at 1: `for pathIndex := 1; pathIndex < len(id); pathIndex++`. 
`TestCreateNamespaceNestedParentMissing` only passes today because the 
warehouse always exists in the fixture, so the intended invariant ("all parent 
namespaces must exist") isn't really being checked. wdyt?



##########
catalog/hadoop/hadoop.go:
##########
@@ -723,3 +772,26 @@ func (c *Catalog) LoadNamespaceProperties(_ 
context.Context, ns table.Identifier
 func (c *Catalog) UpdateNamespaceProperties(_ context.Context, _ 
table.Identifier, _ []string, _ iceberg.Properties) 
(catalog.PropertiesUpdateSummary, error) {
        return catalog.PropertiesUpdateSummary{}, errors.New("hadoop catalog: 
UpdateNamespaceProperties not yet implemented")
 }
+
+// CheckedMkdirAll is a helper function that checks
+// all subdirectories of a given identifier path exist before creating the 
full path.
+// This function is not atomic and may not return ErrNamespaceAlreadyExists if
+// called concurrently with other calls that change the same identifier
+func (c *Catalog) CheckedMkdirAll(id table.Identifier) error {
+       path := c.namespaceToPath(id)
+       for pathIndex := range id {
+               subPath := id[:pathIndex]
+               parentPath := c.namespaceToPath(subPath)
+               if _, err := c.filesystem.Stat(parentPath); err != nil {

Review Comment:
   Returning the raw `Stat` error here breaks the `ErrNoSuchNamespace` contract 
that `CreateNamespace` used to hold. The caller only catches `fs.ErrExist`; 
everything else — including the `fs.ErrNotExist` from a missing parent — falls 
through to the generic `fmt.Errorf`, so the error chain no longer contains 
`catalog.ErrNoSuchNamespace`.
   
   The old `CreateNamespace` had an explicit `if errors.Is(err, fs.ErrNotExist) 
{ ... catalog.ErrNoSuchNamespace ... }` branch that this PR drops. I'd re-add 
that branch in `CreateNamespace` (alongside the existing `ErrExist` check), 
which covers both the intermediate-parent and missing-warehouse-root cases in 
one spot. It'd also help to record which component failed so the message isn't 
misleading.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to