shtripat opened a new issue, #757:
URL: https://github.com/apache/iceberg-go/issues/757
### Feature Request / Improvement
## Summary
The iceberg-go SDK has several code paths that use panic() to report
validation errors instead of returning an error. This is normally masked by the
defer recover() inside Visit[T] / PreOrderVisit[T], but:
1. The recovery wraps string panics without ErrInvalidSchema, making
errors.Is useless for callers.
2. The recovery adds an "error encountered during schema visitor: ..."
prefix that leaks the internal panic/recover mechanism.
3. NestedField.UnmarshalJSON performs no validation on required fields (id,
name), so callers using wrong JSON key names (e.g. "field_id" instead of "id")
get silent zero-values that then trigger the panics later.
These three issues combine into a chain where ordinary bad JSON input turns
into a non-actionable, opaque error deep in schema traversal.
### Affected Code
| Location |
Panic / Bug |
|-----------------------------------------------------|-------------------------------------------------------------------------------------|
| schema.go:1479 — validAvroName | panic("cannot validate empty name") —
string panic; not wrapped with ErrInvalidSchema |
| schema.go:837 — addField | panic(fmt.Errorf("%w: multiple fields for name
…", ErrInvalidSchema, …)) |
| schema.go:991,1020,1053 — pruneColVisitor.Field/List/Map |
panic(fmt.Errorf("%w: cannot explicitly project …", ErrInvalidSchema, …)) |
| schema.go:1071 — pruneColVisitor.projectSelectedStruct | panic("expected
a struct") — bare string panic |
| types.go — NestedField.UnmarshalJSON | No validation: unknown JSON keys
silently ignored; required fields id/name left at zero-value |
### Reproduction
A self-contained reproduction test file is attached below. Add
panic_repro_test.go to the repo root and run:
`go test -v -run "TestBug_"`
All 5 tests pass on main, each one proving the presence of a specific bug.
```
// panic_repro_test.go
package iceberg_test
import (
"encoding/json"
"errors"
"testing"
"github.com/apache/iceberg-go"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// Bug 1a — json.Unmarshal silently accepts "field_id" (wrong key for
"id").
// ID is left as 0; no error is returned.
func TestBug_JSONSilentIgnore_WrongIDKey(t *testing.T) {
data :=
[]byte(`{"field_id":1,"name":"col","type":"string","required":false}`)
var f iceberg.NestedField
err := json.Unmarshal(data, &f)
assert.NoError(t, err) // passes — silent failure, no validation
assert.Equal(t, 0, f.ID) // passes — ID is zero, "field_id":1 was
discarded
}
// Bug 1b — json.Unmarshal silently accepts "field_name" (wrong key for
"name").
// Name is left as ""; no error is returned.
func TestBug_JSONSilentIgnore_WrongNameKey(t *testing.T) {
data :=
[]byte(`{"id":1,"field_name":"col","type":"string","required":false}`)
var f iceberg.NestedField
err := json.Unmarshal(data, &f)
assert.NoError(t, err) // passes — silent failure
assert.Equal(t, "", f.Name) // passes — Name is empty,
"field_name":"col" was discarded
}
// Bug 2 — validAvroName("") panics with a bare string, caught by Visit's
recover.
// The error is NOT wrapped with ErrInvalidSchema; errors.Is() returns
false.
func TestBug_EmptyFieldName_PanicAsOpaqueError(t *testing.T) {
sc := iceberg.NewSchema(1,
iceberg.NestedField{ID: 1, Name: "", Type:
iceberg.PrimitiveTypes.String, Required: false},
)
_, err := iceberg.SanitizeColumnNames(sc)
require.Error(t, err)
assert.False(t, errors.Is(err, iceberg.ErrInvalidSchema))
// passes — not typed
assert.Contains(t, err.Error(), "cannot validate empty name")
// passes — leaks panic string
assert.Contains(t, err.Error(), "error encountered during schema
visitor") // passes — leaks recover wrapper
}
// Bug 3 — full cascade: wrong JSON key → empty Name → panic in schema
visitor.
func TestBug_EmptyName_FromBadJSON_CascadesToPanic(t *testing.T) {
data :=
[]byte(`{"id":1,"field_name":"col","type":"string","required":false}`)
var f iceberg.NestedField
err := json.Unmarshal(data, &f)
require.NoError(t, err)
require.Equal(t, "", f.Name) // Name silently empty from wrong JSON
key
sc := iceberg.NewSchema(1, f)
_, sanitizeErr := iceberg.SanitizeColumnNames(sc)
require.Error(t, sanitizeErr)
assert.False(t, errors.Is(sanitizeErr, iceberg.ErrInvalidSchema)) //
passes — panic string, not typed
assert.Contains(t, sanitizeErr.Error(), "cannot validate empty name")
}
// Bug 4 — addField panics on duplicate field names; panic is caught by
Visit's
// recover and wrapped with an extra "error encountered during schema
visitor:" prefix.
func TestBug_DuplicateFieldNames_PanicWrapped(t *testing.T) {
sc := iceberg.NewSchema(1,
iceberg.NestedField{ID: 1, Name: "col", Type:
iceberg.PrimitiveTypes.String, Required: false},
iceberg.NestedField{ID: 2, Name: "col", Type:
iceberg.PrimitiveTypes.Int32, Required: false},
)
_, err := iceberg.IndexByName(sc)
require.Error(t, err)
assert.ErrorIs(t, err, iceberg.ErrInvalidSchema)
// passes — error panic preserves %w
assert.Contains(t, err.Error(), "error encountered during schema
visitor") // passes — leaks recover prefix
}
```
### Expected Behaviour
| Scenario |
Expected |
|----------------------------|----------------------------------------------------------------------------------------------------------|
| json.Unmarshal with unknown key "field_id" | Return an error (or at
minimum, callers should be able to detect the zero-value ID) |
| SanitizeColumnNames with empty Name | Return fmt.Errorf("%w: …",
ErrInvalidSchema, …) — no panic, no opaque string |
| IndexByName with duplicate names | Return fmt.Errorf("%w: …",
ErrInvalidSchema, …) — no panic, no extra wrapper prefix |
| All schema-visitor errors | Wrapped with ErrInvalidSchema so errors.Is
works; no "error encountered during schema visitor" prefix visible to callers |
### Actual Behaviour
- validAvroName("") at schema.go:1479 calls panic("cannot validate empty
name"). Because the panic value is a plain string, Visit's recover() wraps it
as fmt.Errorf("error
encountered during schema visitor: %s", e) — without %w — so
errors.Is(err, ErrInvalidSchema) is false.
- addField() at schema.go:837 calls panic(fmt.Errorf("%w: …",
ErrInvalidSchema, …)). Visit's recover wraps it with "error encountered during
schema visitor: …" and uses %w, so
errors.Is works, but the caller-visible message still exposes the internal
panic/recover mechanism.
- pruneColVisitor methods at schema.go:991,1020,1053,1071 have the same
problem.
- NestedField.UnmarshalJSON never validates that ID != 0 or Name != "" after
decoding, so incorrect JSON keys silently produce zero-value fields that later
panic inside visitors.
### Suggested Fix Direction
1. NestedField.UnmarshalJSON — after decoding, verify ID != 0 and Name !=
"", returning fmt.Errorf("%w: …", ErrInvalidSchema) on failure.
2. validAvroName — replace panic("cannot validate empty name") with return
false.
3. sanitizeName — add an empty-string guard before n[0] access.
4. indexByName.addField /
pruneColVisitor.Field|List|Map|projectSelectedStruct — use error accumulation
(add an err error field to the visitor struct, set it instead of panicking,
check it after Visit returns). The SchemaVisitor[T] generic interface does not
permit methods to return errors directly, so accumulation is the correct
approach.
This removes all internal panics from validation paths, makes every error
carry ErrInvalidSchema, eliminates the "error encountered during schema
visitor:" prefix from caller-visible messages, and enables reliable errors.Is
checks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]