kenhuuu opened a new pull request, #3484:
URL: https://github.com/apache/tinkerpop/pull/3484

   ## Bound HTTP transaction lifetime: suspend idle timer while busy + add 
`maxTransactionLifetime`
   
   ### What changed
   
   Two related changes to how Gremlin Server bounds the lifetime of an HTTP 
transaction (each open transaction owns a dedicated worker thread and a 
`maxConcurrentTransactions` slot, so an unbounded transaction is a real resource
   leak).
   
   **Idle timer suspends while busy.** The `transactionTimeout` idle timer was 
armed on request *arrival*, so a single operation running longer than the 
timeout would trip it mid-execution and roll back a perfectly healthy
   transaction. It now arms only when the transaction goes genuinely idle (no 
operation running or queued) and is suspended while work is in flight — a long 
operation is bounded by `evaluationTimeout`, not the idle timer. The
   setting is renamed `transactionTimeout` → `idleTransactionTimeout` to 
reflect what it actually means, and `0` now correctly disables it (the docs 
always claimed this; the code never honored it).
   
   **New `maxTransactionLifetime` absolute cap.** A new setting that bounds 
*total* transaction age regardless of activity, closing the gap where a client 
holds a transaction open indefinitely via one long operation or
   keep-alive drips. When it fires it interrupts the running operation and 
rolls the transaction back, and the in-flight client receives a 
transaction-timeout (504) rather than a misleading "increase evaluationTimeout" 
error.
   
   ### Why
   
   The committed idle-timer behavior acted more like a per request timeout 
instead of the described idle timeout and there was no ceiling on transaction 
lifetime at all. Together these give three composable bounds —
   per-operation (`evaluationTimeout`), between-operations 
(`idleTransactionTimeout`), and whole-transaction (`maxTransactionLifetime`) — 
mirroring PostgreSQL's `statement_timeout` / 
`idle_in_transaction_session_timeout` /
   `transaction_timeout`.
   
   Notably, the server **does not validate** these settings or reject begins 
when bounds are disabled. Instead it ships sane defaults (idle 1 min, lifetime 
10 min): a transaction is bounded out of the box, disabling the bounds
   is a deliberate operator choice, and a client's per-request `timeoutMs` is 
always honored as sent rather than second-guessed.
   
   ### Review guide
   
   - **`UnmanagedTransaction` — the executor swap.** The single-thread executor 
is now a `ThreadPoolExecutor(1,1)` subclass purely to expose 
`beforeExecute`/`afterExecute` + the queue, which drive the suspend-while-busy 
logic.
   The key invariant: **submitted tasks must not be wrapped** — `submit()` 
returns the same `FutureTask` so the eval-timeout / cap `cancel(true)` 
interrupts the real work.
   - **Concurrency on the idle/cap timers.** `maybeScheduleIdleTimer` re-checks 
`accepting` *after* arming (so a concurrent `close()` can't be raced into 
re-arming a dying transaction); the in-flight op is tracked as a single
   immutable `Running(future, context)` pair so the cap never flags one 
operation's `Context` while interrupting another's future.
   - **Ownership split (intentional asymmetry).** The idle timer lives in 
`UnmanagedTransaction` (it must see the executor hooks); the lifetime cap is 
scheduled/cancelled by `TransactionManager` (a fixed schedule tied to
   registry membership). The cap is armed **after** `putIfAbsent` so it can 
never fire into an unregistered transaction and leak a thread; `destroy()` 
cancels it on every close path.
   - **`close()` ordering is still load-bearing** — `manager.destroy()` before 
`executor.shutdown()`, graceful `shutdown()` (not `shutdownNow()`). The cap 
path reuses this exact path; verify it's unchanged.
   - **Error mapping** — cap-kill → 504 `TransactionException` via a 
`Context.closedByLifetimeCap` flag set before the interrupt; ordinary eval 
timeout still → 500. Both the eval-timeout writer and 
`formErrorResponseMessage` are
   cap-aware so the code is correct regardless of which thread writes the 
response first.
   - **Tests** — deterministic timer behavior is unit-tested via a 
virtual-clock `ManualScheduledExecutorService` (no `Thread.sleep` flakiness); 
integration tests assert the guarantee actually made (transaction reclaimed /
   subsequent 404), since timing can't reliably catch the mid-op interrupt.
   
   <!--
   Thanks for contributing! Reminders:
   + TARGET the earliest branch where you want the change
       3.7-dev -> 3.7.7 (non-breaking only)
       3.8-dev -> 3.8.2 (non-breaking only)
       master  -> 4.0.0
   + Committers will MERGE the PR forward to newer versions
   + ADD entry to the CHANGELOG.asciidoc for the targeted version
       Do not reference a JIRA number there
   + ADD JIRA number to title and link in description
   + PRs requires 3 +1s from committers OR
                  1 +1 and 7 day wait to merge.
   + MORE details: https://s.apache.org/rtnal
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to