Copilot commented on code in PR #1124:
URL:
https://github.com/apache/skywalking-banyandb/pull/1124#discussion_r3212996739
##########
pkg/schema/cache.go:
##########
@@ -108,7 +109,29 @@ func (sr *schemaRepo) SendMetadataEvent(event
MetadataEvent) {
return
}
defer sr.closer.SenderDone()
+ // SendMetadataEvent runs synchronously on the caller goroutine (e.g. an
+ // etcd-watch handler such as (*schemaRepo).OnAddOrUpdate). A panic from
+ // processEvent — string-typed today, error-typed once Step 1b lands —
+ // would otherwise propagate to a caller that has no recover() of its
own.
+ // The classifier branch escalates permanent errors to .Fatal() so
+ // runtime registration of an incompatible group exits the process the
+ // same way the boot path does; non-permanent panics are logged and
+ // swallowed (no regression vs status quo).
+ defer func() {
+ if recovered := recover(); recovered != nil {
+ if recErr, isErr := recovered.(error); isErr &&
initerror.IsPermanent(recErr) {
+ sr.l.Fatal().Err(recErr).Interface("event",
event).Str("stack", string(debug.Stack())).
+ Msg("SendMetadataEvent hit a permanent
error, refusing to continue")
+ }
+ sr.l.Warn().Interface("err",
recovered).Interface("event", event).Str("stack", string(debug.Stack())).
+ Msg("recovered from SendMetadataEvent panic")
Review Comment:
The new SendMetadataEvent defer-recover swallows non-permanent panics (logs
a warning and then returns) without re-queuing the event. That means a
transient panic inside processEvent (or below it) will drop the metadata event
on the floor and leave the in-memory cache permanently stale. Consider
converting recovered non-permanent panics into an error and feeding it through
the existing transient-retry path (metrics + enqueue onto eventCh), or
re-panicking so the caller can handle/restart instead of silently losing the
update.
##########
banyand/internal/storage/tsdb.go:
##########
@@ -239,15 +240,31 @@ func OpenTSDB[T TSTable, O any](ctx context.Context, opts
TSDBOpts[T, O], cache
lockPath := filepath.Join(opts.Location, lockFilename)
lock, err := tsdbLfs.CreateLockFile(lockPath, FilePerm)
if err != nil {
- logger.Panicf("cannot create lock file %s: %s", lockPath, err)
+ panic(fmt.Errorf("cannot create lock file %s: %w", lockPath,
err))
}
db.lock = lock
+ // Release the lock fd if any subsequent step in OpenTSDB returns an
error.
+ // Otherwise the file descriptor leaks until process exit and a retry
on the
+ // same data dir hits "resource temporarily unavailable" from the next
+ // CreateLockFile call. Ownership transfers to (*database).Close() only
on
+ // success — released = true is set on the success return below.
+ released := false
+ defer func() {
+ if !released {
+ _ = lock.Close()
+ _ = tsdbLfs.DeleteFile(lockPath)
+ }
+ }()
if err := db.segmentController.open(); err != nil {
return nil, err
}
obsservice.MetricsCollector.Register(location, db.collect)
db.disableRotation = opts.DisableRotation
- return db, db.startRotationTask()
+ if err := db.startRotationTask(); err != nil {
+ return nil, err
+ }
+ released = true
+ return db, nil
Review Comment:
OpenTSDB now returns (nil, err) when startRotationTask() fails, but by that
point segmentController.open() has succeeded and
MetricsCollector.Register(location, db.collect) has already been called.
Because the partially-initialized db is no longer returned to the caller,
there’s no way for the caller to Close() and unregister/cleanup, so this error
path can leak resources and leave the metrics collector registered. Consider
cleaning up (e.g., unregister + close db/segmentController/scheduler) before
returning the error, or preserve the previous behavior of returning the db
alongside the error so the caller can close it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]