Jaehui Lee created HBASE-29781:
----------------------------------

             Summary: Dynamic configurations for call queue length doesn't work 
correctly when increasing the limit
                 Key: HBASE-29781
                 URL: https://issues.apache.org/jira/browse/HBASE-29781
             Project: HBase
          Issue Type: Bug
          Components: rpc
            Reporter: Jaehui Lee
            Assignee: Jaehui Lee


h2. Problem

The dynamic configurations for call queue length (such as 
{{{}ipc.server.max.callqueue.length{}}}, 
{{{}ipc.server.priority.max.callqueue.length{}}}, 
{{{}ipc.server.replication.max.callqueue.length{}}}, 
{{{}ipc.server.bulkload.max.callqueue.length{}}}) only works when *decreasing* 
the limit. When *increasing* the limit, the configuration change has no effect 
- tasks are still rejected at the hard limit.

*Example:*
 * Initial configuration: {{ipc.server.max.callqueue.length = 100}}
 * Change configuration to: {{ipc.server.max.callqueue.length = 200}}
 * Expected: Queue accepts up to 200 tasks
 * Actual: Queue still rejects tasks at 100

h2. Root Cause

{{RpcExecutor}} uses two limit mechanisms: (This was introduced by HBASE-15306)
 # *Soft limit* ({{{}currentQueueLimit{}}} variable): Updated by 
{{resizeQueues()}} when configuration changes
 # *Hard limit* ({{{}BlockingQueue{}}} capacity): Set once during queue 
initialization and *cannot be changed*

 
During initialization, queues are created with a fixed capacity based on the 
initial configuration value. When configuration changes, {{resizeQueues()}} 
only updates the {{currentQueueLimit}} variable but cannot modify the 
underlying {{BlockingQueue}} capacity, which is immutable.
h2. Proposed Solutions

*Option 1: Set hard limit to Integer.MAX_VALUE (or sufficiently large value)*

Modify {{initializeQueues()}} to set the queue capacity to 
{{Integer.MAX_VALUE}} instead of the configured value, and rely solely on 
{{currentQueueLimit}} for enforcement. This is simple and enables dynamic 
resizing in both directions. Note that this may allow slight overshooting of 
the soft limit due to race conditions in concurrent dispatch.

*Option 2: Recreate queues when increasing capacity*

When {{resizeQueues()}} detects an increase in limit, drain existing queues and 
create new ones with the larger capacity. This is more complex but preserves 
hard limit safety.

*Option 3: Use Semaphore for limit enforcement*

Maintain a separate {{Semaphore}} per queue for atomic limit control. This 
eliminates race conditions but adds overhead.

 

I'm uncertain whether this behavior is intentional or needs fixing. Is this 
something that should be addressed? If so, which approach would be most 
appropriate for HBase's architecture? Any feedback would be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to