elek opened a new pull request #1644:
URL: https://github.com/apache/ozone/pull/1644


   ## What changes were proposed in this pull request?
   
   Number of threads for closed container replications can be adjusted by the 
settings  `hdds.datanode.replication.streams.limit`. But this number is ignored 
today due to the misuse of `ThreadPoolExecutor`:
   
   ```java
   new ThreadPoolExecutor(
           0, poolSize, 60, TimeUnit.SECONDS,
           new LinkedBlockingQueue<>(),
           new ThreadFactoryBuilder().setDaemon(true)
               .setNameFormat("ContainerReplicationThread-%d")
               .build())
   ```
   
   Here the minimal number of threads is 0 and the maximum number of the 
threads is the configured value.  Threads in the thread pool supposed to be 
scaled up, but it doesn't.
   
   [From the JDK 
docs](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#ThreadPoolExecutor(int,%20int,%20long,%20java.util.concurrent.TimeUnit,%20java.util.concurrent.BlockingQueue)):
   
   > A ThreadPoolExecutor will automatically adjust the pool size (see 
getPoolSize()) according to the bounds set by corePoolSize (see 
getCorePoolSize()) and maximumPoolSize (see getMaximumPoolSize()). When a new 
task is submitted in method execute(java.lang.Runnable), [...] [AND]  If there 
are more than corePoolSize but less than maximumPoolSize threads running, a new 
thread will be created only if the queue is full.
   
   So if queue is not full (and `LinkedBlockgingQueue` is unbounded by default) 
the threads will never be created.
   
   For a quick fix we can switch to use static thread pool instead of dynamic 
and always keep the required number of threads.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4535
   
   ## How was this patch tested?
   
   A new utility is committed as a unit test 
(`ReplicationSupervisorScheduling`). It helps to test the scheduling (and 
thread numbers can be checked by jstack) but it's not executed as part of the 
unit test (because it's slow and un-predictable) as it doesn't start with 
`Test*`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to