[
https://issues.apache.org/jira/browse/DISPATCH-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307148#comment-17307148
]
Ken Giusti edited comment on DISPATCH-2006 at 3/23/21, 3:10 PM:
----------------------------------------------------------------
To illustrate what is causing the memory growth, consider the simplest
data flow:
One router.
1 endpoint sender (SOURCE)
1 endpoint receiver (SINK)
A single "infinite" streaming message.
There are 5 points in the byte stream where buffering/flow control occurs:
{{SOURCE=>[network]=>[Session(IN)]=>[Q2]=>[Session(OUT)]=>[network]=>SINK}}
Let's ignore the network buffering since it's outside dispatch's
control and applies to both sides of the flow. That leaves us with three
points of flow control/buffering: Session(IN), Q2, and Session(OUT).
The Q2 flow control threshold is 128KB.
The Session(OUT) buffering level is limited by Q3 to 512KB.
The Q2 and Q3 thresholds are not configurable (compile time constant).
Session(IN) buffering level is configurable, either indirectly using
the maxFrameSize and maxSessionFrames configuration attributes on
connectors and listeners, or directly by setting the byte threshold
via policy.
If maxSessionFrames is left unspecified, the code computes a default
flow control threshold for Session(IN). (The code refers to this
threshold as the "incoming_capacity".)
The actual value for incoming_capacity depends on the CPU's word size.
If the CPU is 64bit, then the default incoming capacity is
35,184,372,072,448 bytes. Yes, that's 32 _*TERAbytes*_. If the word size
is 32bit, a more "reasonable" 2GB incoming capacity is used.
Let's depict the data buffering again, with threshold limits:
{{SOURCE==>[32TB]===>[128K]==>[512K]==>SINK}}
{{ Session(IN) Q2 Q3}}
If SINK consumes at a slower rate than SOURCE can produce, the
back pressure will first fill the 512K Q3 buffer, then the 128K Q2
buffer, and finally the 32TB Session(IN) buffer.
But that 32TB buffer is for all intents and purposes an infinite
buffer with no flow control since the threshold will never be reached.
Consider now what happens in the multi-router case:
{{SOURCE=>[32TB]=>[128K]=>[512K]=>[32TB]=>[128K]=>[512K]=>SINK}}
{{ [----- Router A -----] [----- Router B -----]}}
This will result in the buffering occuring "downstream" at Router B,
as Router B's 32TB Session(IN) buffer will need to fill before
back pressure will occur on Router A.
Adaptors employ a buffering scheme that is different from the above.
While adaptors do enforce Q2, no adaptor has the concept of a session
window. Instead there is a small 16 entry buffer offered by the
Proton raw connection. Assuming 512 byte buffer size, this raw
connection buffer is merely 512 * 16 = 8KB.
So for two TCP endpoints:
{{SOURCE=>[8KB]=>[128K]=>...=>[128K]=>[8KB=>SINK}}
Now map that to a two router deployment:
{{SOURCE=>[8KB]=>[128K]=>[512K]=>[32TB]=>[128K]=>[8KB]=>SINK}}
{{ [----- Router A ----] [----- Router B ----]}}
Same effect will occur as pure AMQP when SINK back-pressures: the
downstream router will end up with unlimited buffer
growth.
TL;DR - Session(IN) default threshold is incorrect.
TL;DR^2 - Memory growth is not TCP specific.
was (Author: kgiusti):
To illustrate what is causing the memory growth, consider the simplest
data flow:
One router.
1 endpoint sender (SOURCE)
1 endpoint receiver (SINK)
A single "infinite" streaming message.
There are 5 points in the byte stream where buffering/flow control occurs:
{{SOURCE-->[network]-->[Session(IN)]-->[Q2]-->[Session(OUT)]-->[network]–>SINK}}
Let's ignore the network buffering since it's outside dispatch's
control and applies to both sides of the flow. That leaves us with three
points of flow control/buffering: Session(IN), Q2, and Session(OUT).
The Q2 flow control threshold is 128KB.
The Session(OUT) buffering level is limited by Q3 to 512KB.
The Q2 and Q3 thresholds are not configurable (compile time constant).
Session(IN) buffering level is configurable, either indirectly using
the maxFrameSize and maxSessionFrames configuration attributes on
connectors and listeners, or directly by setting the byte threshold
via policy.
If maxSessionFrames is left unspecified, the code computes a default
flow control threshold for Session(IN). (The code refers to this
threshold as the "incoming_capacity".)
The actual value for incoming_capacity depends on the CPU's word size.
If the CPU is 64bit, then the default incoming capacity is
35,184,372,072,448 bytes. Yes, that's 32 TERAbytes. If the word size
is 32bit, a more "reasonable" 2GB incoming capacity is used.
Let's depict the data buffering again, with threshold limits:
{{SOURCE--->[32TB]--->[128K]-->[512K]-->SINK}}
{{ Session(IN) Q2 Q3}}
If SINK consumes at a slower rate than SOURCE can produce, the
back pressure will first fill the 512K Q3 buffer, then the 128K Q2
buffer, and finally the 32TB Session(IN) buffer.
But that 32TB buffer is for all intents and purposes an infinite
buffer with no flow control since the threshold will never be reached.
Consider now what happens in the multi-router case:
{{SOURCE-->[32TB]-->[128K]-->[512K]-->[32TB]-->[128K]-->[512K]-->SINK}}
{{ [------ Router A ------] [------ Router B ------]}}
This will result in the buffering occuring "downstream" at Router B,
as Router B's 32TB Session(IN) buffer will need to fill before
back pressure will occur on Router A.
Adaptors employ a buffering scheme that is different from the above.
While adaptors do enforce Q2, no adaptor has the concept of a session
window. Instead there is a small 16 entry buffer offered by the
Proton raw connection. Assuming 512 byte buffer size, this raw
connection buffer is merely 512 * 16 = 8KB.
So for two TCP endpoints:
{{SOURCE-->[8KB]-->[128K]-->...-->[128K]-->[8KB]–>SINK}}
Now map that to a two router deployment:
{{SOURCE-->[8KB]-->[128K]-->[512K]-->[32TB]-->[128K]-->[8KB]-->SINK}}
{{ [------ Router A -----] [------ Router B -----]}}
Same effect will occur as pure AMQP when SINK back-pressures: the
downstream router will end up with unlimited buffer
growth.
TL;DR - Session(IN) default threshold is incorrect.
TL;DR^2 - Memory growth is not TCP specific.
> Set stricter default maxSessionFrames to avoid router OOM under load
> --------------------------------------------------------------------
>
> Key: DISPATCH-2006
> URL: https://issues.apache.org/jira/browse/DISPATCH-2006
> Project: Qpid Dispatch
> Issue Type: Improvement
> Components: Router Node
> Affects Versions: 1.15.0
> Reporter: Ken Giusti
> Assignee: Ken Giusti
> Priority: Major
> Fix For: 1.16.0
>
>
> By default, sessions created by the router have an essentially unlimited
> session window size. This can result in unconstrained growth of router heap
> memory during periods of high traffic congestion.
>
> The router needs to use reasonable session incoming window limits by default.
> It's likely that different type of connections - endpoints, inter-router,
> etc - should have different default window sizes. For example it's
> reasonable to provide a bigger window for inter-router links than simple
> endpoint connections.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]