[ 
https://issues.apache.org/jira/browse/HDDS-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duong updated HDDS-7936:
------------------------
    Description: 
Today, by default SCM creates 10 in-memory blocking queues for the container 
report handlers (each is bound with a worker thread). The queues are shared for 
Full Container Report (FCR) and Incremental Container Report (ICR).  By 
default, the max_capacity is set to *100,000* per queue, 
[ref|https://github.com/apache/ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java#L499-L499].
{code:java}
public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
    OzoneConfiguration configuration) {
  int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
          + ".thread.pool.size",
      OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
  int queueSize = configuration.getInt(getContainerReportConfPrefix()
          + ".queue.size",
      OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
  List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
  for (int i = 0; i < threadPoolSize; ++i) {
    queues.add(new ContainerReportQueue(queueSize));
  }
  return queues;
} {code}
That indicates a few potential problems:
 # The number of possible queued messages can reach {*}1M{*}, which is 
{*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs 
are much bigger as each many container reports, and usually is a matter of 
{*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
Such a default configuration indeed doesn't limit any boundary for the 
container report queues. And that opens a good chance for SCM to choke and 
crash if it slows down the consumption of reports at any point, or in any event 
that many datanodes send ICR at the same time.
 # As mentioned above, ICR and FCR are very different in size (and probably in 
the ability of SCM to process). It's not good to use the same queue for both 
types. Separation is needed to differentiate queue size control for them.

To better protect:
 # Separate ICR and FCR queues.
 # Set the right default boundary for each queue type. *5000* (in total) ** for 
{*}FCR{*}, and *20,000* for *ICR* can be a good start.
 # Do we really need multiple queues per message type? The BlockingQueue is 
designed for the scene of using a single queue to sever multiple 
producers/consumers. If there's no particular reason to do so, I suggest 
simplifying the model to use a single queue for a message handler.

 

 

  was:
Today, by default SCM creates 10 in-memory blocking queues for the container 
report handlers (each is bound with a worker thread). The queues are shared for 
Full Container Report (FCR) and Incremental Container Report (ICR).  By 
default, the max_capacity is set t *100,000* per queue, .
{code:java}
public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
    OzoneConfiguration configuration) {
  int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
          + ".thread.pool.size",
      OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
  int queueSize = configuration.getInt(getContainerReportConfPrefix()
          + ".queue.size",
      OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
  List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
  for (int i = 0; i < threadPoolSize; ++i) {
    queues.add(new ContainerReportQueue(queueSize));
  }
  return queues;
} {code}
That indicates a few potential problems:
 # The number of possible queued messages can reach {*}1M{*}, which is 
{*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs 
are much bigger as each many container reports, and usually is a matter of 
{*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
Such a default configuration indeed doesn't limit any boundary for the 
container report queues. And that opens a good chance for SCM to choke and 
crash if it slows down the consumption of reports at any point, or in any event 
that many datanodes send ICR at the same time.
 # As mentioned above, ICR and FCR are very different in size (and probably in 
the ability of SCM to process). It's not good to use the same queue for both 
types. Separation is needed to differentiate queue size control for them.

To better protect:
 # Separate ICR and FCR queues.
 # Set the right default boundary for each queue type. *5000* (in total) ** for 
{*}FCR{*}, and *20,000* for *ICR* can be a good start.
 # Do we really need multiple queues per message type? The BlockingQueue is 
designed for the scene of using a single queue to sever multiple 
producers/consumers. If there's no particular reason to do so, I suggest 
simplifying the model to use a single queue for a message handler.

 

 


> Better control of SCM container report queue size
> -------------------------------------------------
>
>                 Key: HDDS-7936
>                 URL: https://issues.apache.org/jira/browse/HDDS-7936
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Duong
>            Priority: Major
>
> Today, by default SCM creates 10 in-memory blocking queues for the container 
> report handlers (each is bound with a worker thread). The queues are shared 
> for Full Container Report (FCR) and Incremental Container Report (ICR).  By 
> default, the max_capacity is set to *100,000* per queue, 
> [ref|https://github.com/apache/ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java#L499-L499].
> {code:java}
> public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
>     OzoneConfiguration configuration) {
>   int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
>           + ".thread.pool.size",
>       OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
>   int queueSize = configuration.getInt(getContainerReportConfPrefix()
>           + ".queue.size",
>       OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
>   List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
>   for (int i = 0; i < threadPoolSize; ++i) {
>     queues.add(new ContainerReportQueue(queueSize));
>   }
>   return queues;
> } {code}
> That indicates a few potential problems:
>  # The number of possible queued messages can reach {*}1M{*}, which is 
> {*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs 
> are much bigger as each many container reports, and usually is a matter of 
> {*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
> Such a default configuration indeed doesn't limit any boundary for the 
> container report queues. And that opens a good chance for SCM to choke and 
> crash if it slows down the consumption of reports at any point, or in any 
> event that many datanodes send ICR at the same time.
>  # As mentioned above, ICR and FCR are very different in size (and probably 
> in the ability of SCM to process). It's not good to use the same queue for 
> both types. Separation is needed to differentiate queue size control for them.
> To better protect:
>  # Separate ICR and FCR queues.
>  # Set the right default boundary for each queue type. *5000* (in total) ** 
> for {*}FCR{*}, and *20,000* for *ICR* can be a good start.
>  # Do we really need multiple queues per message type? The BlockingQueue is 
> designed for the scene of using a single queue to sever multiple 
> producers/consumers. If there's no particular reason to do so, I suggest 
> simplifying the model to use a single queue for a message handler.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to