[
https://issues.apache.org/jira/browse/HDDS-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated HDDS-7936:
------------------------
Description:
Today, by default SCM creates 10 in-memory blocking queues for the container
report handlers (each is bound with a worker thread). The queues are shared for
Full Container Report (FCR) and Incremental Container Report (ICR). By
default, the max_capacity is set to *100,000* per queue,
[ref|https://github.com/apache/ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java#L499-L499].
{code:java}
public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
OzoneConfiguration configuration) {
int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
+ ".thread.pool.size",
OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
int queueSize = configuration.getInt(getContainerReportConfPrefix()
+ ".queue.size",
OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
for (int i = 0; i < threadPoolSize; ++i) {
queues.add(new ContainerReportQueue(queueSize));
}
return queues;
} {code}
That indicates a few potential problems:
# The number of possible queued messages can reach {*}1M{*}, which is
{*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs
are much bigger as each many container reports, and usually is a matter of
{*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
Such a default configuration indeed doesn't limit any boundary for the
container report queues. And that opens a good chance for SCM to choke and
crash if it slows down the consumption of reports at any point, or in any event
that many datanodes send ICR at the same time.
# As mentioned above, ICR and FCR are very different in size (and probably in
the ability of SCM to process). It's not good to use the same queue for both
types. Separation is needed to differentiate queue size control for them.
To better protect:
# Separate ICR and FCR queues.
# Set the right default boundary for each queue type. *5000* (in total) ** for
{*}FCR{*}, and *20,000* for *ICR* can be a good start.
# Do we really need multiple queues per message type? The BlockingQueue is
designed for the scene of using a single queue to sever multiple
producers/consumers. If there's no particular reason to do so, I suggest
simplifying the model to use a single queue for a message handler.
was:
Today, by default SCM creates 10 in-memory blocking queues for the container
report handlers (each is bound with a worker thread). The queues are shared for
Full Container Report (FCR) and Incremental Container Report (ICR). By
default, the max_capacity is set t *100,000* per queue, .
{code:java}
public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
OzoneConfiguration configuration) {
int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
+ ".thread.pool.size",
OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
int queueSize = configuration.getInt(getContainerReportConfPrefix()
+ ".queue.size",
OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
for (int i = 0; i < threadPoolSize; ++i) {
queues.add(new ContainerReportQueue(queueSize));
}
return queues;
} {code}
That indicates a few potential problems:
# The number of possible queued messages can reach {*}1M{*}, which is
{*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs
are much bigger as each many container reports, and usually is a matter of
{*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
Such a default configuration indeed doesn't limit any boundary for the
container report queues. And that opens a good chance for SCM to choke and
crash if it slows down the consumption of reports at any point, or in any event
that many datanodes send ICR at the same time.
# As mentioned above, ICR and FCR are very different in size (and probably in
the ability of SCM to process). It's not good to use the same queue for both
types. Separation is needed to differentiate queue size control for them.
To better protect:
# Separate ICR and FCR queues.
# Set the right default boundary for each queue type. *5000* (in total) ** for
{*}FCR{*}, and *20,000* for *ICR* can be a good start.
# Do we really need multiple queues per message type? The BlockingQueue is
designed for the scene of using a single queue to sever multiple
producers/consumers. If there's no particular reason to do so, I suggest
simplifying the model to use a single queue for a message handler.
> Better control of SCM container report queue size
> -------------------------------------------------
>
> Key: HDDS-7936
> URL: https://issues.apache.org/jira/browse/HDDS-7936
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Duong
> Priority: Major
>
> Today, by default SCM creates 10 in-memory blocking queues for the container
> report handlers (each is bound with a worker thread). The queues are shared
> for Full Container Report (FCR) and Incremental Container Report (ICR). By
> default, the max_capacity is set to *100,000* per queue,
> [ref|https://github.com/apache/ozone/blob/master/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfigKeys.java#L499-L499].
> {code:java}
> public static List<BlockingQueue<ContainerReport>> initContainerReportQueue(
> OzoneConfiguration configuration) {
> int threadPoolSize = configuration.getInt(getContainerReportConfPrefix()
> + ".thread.pool.size",
> OZONE_SCM_EVENT_THREAD_POOL_SIZE_DEFAULT);
> int queueSize = configuration.getInt(getContainerReportConfPrefix()
> + ".queue.size",
> OZONE_SCM_EVENT_CONTAINER_REPORT_QUEUE_SIZE_DEFAULT);
> List<BlockingQueue<ContainerReport>> queues = new ArrayList<>();
> for (int i = 0; i < threadPoolSize; ++i) {
> queues.add(new ContainerReportQueue(queueSize));
> }
> return queues;
> } {code}
> That indicates a few potential problems:
> # The number of possible queued messages can reach {*}1M{*}, which is
> {*}HUGE{*}. Assuming each ICR message is 10KB, 1M ICR messages are 10GB. FCRs
> are much bigger as each many container reports, and usually is a matter of
> {*}MBs{*}. 10K FCR messages are enough to swallow all SCM heap memory.
> Such a default configuration indeed doesn't limit any boundary for the
> container report queues. And that opens a good chance for SCM to choke and
> crash if it slows down the consumption of reports at any point, or in any
> event that many datanodes send ICR at the same time.
> # As mentioned above, ICR and FCR are very different in size (and probably
> in the ability of SCM to process). It's not good to use the same queue for
> both types. Separation is needed to differentiate queue size control for them.
> To better protect:
> # Separate ICR and FCR queues.
> # Set the right default boundary for each queue type. *5000* (in total) **
> for {*}FCR{*}, and *20,000* for *ICR* can be a good start.
> # Do we really need multiple queues per message type? The BlockingQueue is
> designed for the scene of using a single queue to sever multiple
> producers/consumers. If there's no particular reason to do so, I suggest
> simplifying the model to use a single queue for a message handler.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]