Hi Iceberg devs,

I opened https://github.com/apache/iceberg/issues/16744 to propose
improving commit failure messages when commit retries are exhausted.

Today, when a commit fails after exhausting commit.retry.* backoff, Iceberg
rethrows the underlying CommitFailedException, but the final message does
not make it clear why the retry loop stopped. That makes it hard for
clients and operators to know whether they should tune:

   - commit.retry.num-retries
   - commit.retry.total-timeout-ms
   - commit.retry.min-wait-ms / commit.retry.max-wait-ms

I’d like to improve this by exposing whether commit retries stopped because
the attempt budget was exhausted, the total retry timeout was exceeded, or
both.

My current thinking is:

   - classify retry exhaustion in Tasks
   - preserve the original exception as the cause
   - translate the exhaustion reason into commit specific guidance at
   commit call sites

For example, the final commit exception could include guidance like
“increase commit.retry.num-retries” when the attempt limit is reached, or
“increase commit.retry.total-timeout-ms” when the elapsed retry timeout is
reached.

I’d appreciate feedback on whether this direction makes sense, especially
around where the retry exhaustion classification should live and how much
detail should be surfaced in the final exception message.

Thanks,
Joana

Reply via email to