wardlican commented on PR #3969: URL: https://github.com/apache/amoro/pull/3969#issuecomment-4190397372
> Thanks for the contribution! After reviewing the current master branch architecture, I think this request forwarding mechanism may no longer be necessary. Here's the reasoning: > > **Why forwarding is not needed:** > > 1. **All write operations go directly to DB** — `createCatalog`, `dropCatalog`, `updateCatalog`, and table creation all write to the database directly. No in-memory leader-exclusive state is involved. Any node can handle these requests independently. > 2. **Optimizing status is fully persisted** — Every `OptimizingStatus` transition is written to `table_runtime.status_code` synchronously (via `StatedPersistentBase.invokeConsistency()`). `TableController` already reads status directly from DB via `TableRuntimeMapper.selectRuntime()`, not from in-memory `tableRuntimeMap`. Any node can serve these queries correctly. > 3. **Optimizing process history is also DB-backed** — `getOptimizingProcesses` queries `optimizing_process` table directly. No forwarding needed. > 4. **`DefaultTableService` handles cross-node table sync natively** — In master-slave mode, each node periodically syncs its assigned bucket tables via `syncBucketTables()`. The bucket routing is already handled at the data layer, not the HTTP layer. > > The only purely in-memory state is the optimizer heartbeat tracking and in-flight task assignment inside `OptimizingQueue`, but these are operational monitoring concerns, not correctness-critical for API responses. > > Given this, adding an HTTP proxy layer with circuit breakers, retry logic, and exception-as-control-flow introduces significant complexity without a clear benefit. A simpler approach would be to route Dashboard traffic through a load balancer. > > Would be happy to discuss further if there are specific cases I missed. > Thanks for the contribution! After reviewing the current master branch architecture, I think this request forwarding mechanism may no longer be necessary. Here's the reasoning: > > **Why forwarding is not needed:** > > 1. **All write operations go directly to DB** — `createCatalog`, `dropCatalog`, `updateCatalog`, and table creation all write to the database directly. No in-memory leader-exclusive state is involved. Any node can handle these requests independently. > 2. **Optimizing status is fully persisted** — Every `OptimizingStatus` transition is written to `table_runtime.status_code` synchronously (via `StatedPersistentBase.invokeConsistency()`). `TableController` already reads status directly from DB via `TableRuntimeMapper.selectRuntime()`, not from in-memory `tableRuntimeMap`. Any node can serve these queries correctly. > 3. **Optimizing process history is also DB-backed** — `getOptimizingProcesses` queries `optimizing_process` table directly. No forwarding needed. > 4. **`DefaultTableService` handles cross-node table sync natively** — In master-slave mode, each node periodically syncs its assigned bucket tables via `syncBucketTables()`. The bucket routing is already handled at the data layer, not the HTTP layer. > > The only purely in-memory state is the optimizer heartbeat tracking and in-flight task assignment inside `OptimizingQueue`, but these are operational monitoring concerns, not correctness-critical for API responses. > > Given this, adding an HTTP proxy layer with circuit breakers, retry logic, and exception-as-control-flow introduces significant complexity without a clear benefit. A simpler approach would be to route Dashboard traffic through a load balancer. > > Would be happy to discuss further if there are specific cases I missed. Yes, all current state information is retrieved from the database, and any updated states are also synchronized back to it. Since every AMS node synchronizes its state information from the database, this feature is not required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
