[
https://issues.apache.org/jira/browse/CASSANDRA-20052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897258#comment-17897258
]
Dmitry Konstantinov edited comment on CASSANDRA-20052 at 11/11/24 7:30 PM:
---------------------------------------------------------------------------
{quote}I was expecting in the 3rd case to behave like the 1st and 2nd cases
once we have consumed all the capacity with the first 16 messages.
{quote}
but is it a violation of basic (throwOnOverload = false) rate limiting semantic
if we start to fail requests when we consume all capacity? My impression is
that the whole idea of (throwOnOverload = false) rate limiting is not to throw
exceptions as a way to limit the rate. For me there is no big difference
between 32 "large" messages of 16 MiB, 32 x 16 still "large" messages of 1
MiB and 8 x 32 x 16 "small" messages of 128 KiB .. Throwing of
OverloadedException for some valid message sizes and not throwing it for others
will be confusing for users (who actually expect not throwing
OverloadedException at all) ..
-------------------
If we want to fix the rate limiting logic as well (actually it was not an
intention for my PR originally, I wanted to solve the issues one by one :))
then what is about the following approach?:
# Initially we check the message size against min(max_mutation_size,
native_transport_max_request_data_in_flight_per_ip,
native_transport_max_request_data_in_flight). If it is bigger - we raise
InvalidRequestException error and skip the message.
# Then we try to acquireCapacity(header, endpointReserve, globalReserve) and
if it returns false - then we return false from processFirstFrameOfLargeMessage
to pause the connection (actually to return back to the way how it was done
before
[https://github.com/beobal/cassandra/commit/0aae5b4921a3ce6c21a8a0e977624c877b19cd5b#diff-8e8dd0c6394bca7f4ac929d4b1471bd8877b222af1d48b2ebac331d458326de2R340]).
The difference with the old logic - we are sure now that the message size <=
native_transport_max_request_data_in_flight_per_ip and
native_transport_max_request_data_in_flight, so a starvation is not possible
for the following case: a connection is blocked forever if somebody tries to
send a single too big message > total capacity. Once other messages in the same
or other CQL connections are processed and capacity is returned to the limits -
we are sure that there is enough capacity to acquire it for the current large
message ..
was (Author: dnk):
{quote}I was expecting in the 3rd case to behave like the 1st and 2nd cases
once we have consumed all the capacity with the first 16 messages.
{quote}
but is it a violation of basic (throwOnOverload = false) rate limiting semantic
if we will start to fail requests when we consume all capacity? My impression
is that the whole idea of (throwOnOverload = false) rate limiting is not to
throw exceptions if as a way to limit the rate. For me there is no big
difference between 32 "large" messages of 16 MiB, 32 x 16 still "large"
messages of 1 MiB and 8 x 32 x 16 "small" messages of 128 KiB .. Throwing of
OverloadedException for some valid message sizes and not throwing it for others
will be confusing for users (who actually expect not throwing
OverloadedException at all) ..
-------------------
If we want to fix the rate limiting logic as well (actually it was not an
intention for my PR originally, I wanted to solve the issues one by one :))
then what is about the following approach?:
# Initially we check the message size against min(max_mutation_size,
native_transport_max_request_data_in_flight_per_ip,
native_transport_max_request_data_in_flight). If it is bigger - we raise
InvalidRequestException error and skip the message.
# Then we try to acquireCapacity(header, endpointReserve, globalReserve) and
if it returns false - then we return false from processFirstFrameOfLargeMessage
to pause the connection (actually to return back to the way how it was done
before
[https://github.com/beobal/cassandra/commit/0aae5b4921a3ce6c21a8a0e977624c877b19cd5b#diff-8e8dd0c6394bca7f4ac929d4b1471bd8877b222af1d48b2ebac331d458326de2R340]).
The difference with the old logic - we are sure now that the message size <=
native_transport_max_request_data_in_flight_per_ip and
native_transport_max_request_data_in_flight, so a starvation is not possible
for the case when the current connection is blocked forever because it is
technically not possible to allocate the capacity for a single too big message.
Once other messages in the same or other CQL connections are processed and
capacity is returned to the limits - we are sure that there is enough capacity
to acquire it for the current large message ..
> Size of CQL messages is not limited in V5 protocol logic
> --------------------------------------------------------
>
> Key: CASSANDRA-20052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20052
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Attachments: cassandra_rate_limit.svg
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Size of CQL messages is not limited in V5 protocol logic
> - After introducing of v5 frames we do not have any CQL message limit
> anymore, native_transport_max_frame_size_in_mb which had such limit in pre-V5
> epoch is applicable now only to pre-V5 protocol sessions, otherwise it is
> applied only to the initial STARTUP/OPTIONS messages handling, it is not
> checked in any v5 logic. So, currently a v5 CQL message of any size can be
> sent to Cassandra server.
> - The overload logic just allows to process huge messages for free to avoid
> starvation, so it does not provide any protection against the most dangerous
> requests from a memory pressure point of view.
> - The situation even more dangerous: the v5 framing logic is enabled just
> after AUTH response, so we do not limit message size even for AUTH_RESPONSE
> messages from a client. It can be used as a DoS attack: a non-authenticated
> client can send a huge username/password to Cassandra server to cause
> troubles with GC or even kill it.
> An easy example:
> {code:java}
> public class TestBigAuthRequest {
> public static void main(String[] args) {
> String password = getString(500_000_000, '-');
> try (CqlSession session = CqlSession.builder()
> .addContactEndPoint(new DefaultEndPoint(new
> InetSocketAddress("localhost", 9042)))
> .withAuthCredentials("cassandra", password)
> .withLocalDatacenter("datacenter1")
> .build()) {
> session.execute("select * from system.local");
> }
> }
> private static String getString(int length, char charToFill) {
> if (length > 0) {
> char[] array = new char[length];
> Arrays.fill(array, charToFill);
> return new String(array);
> }
> return "";
> }
> }
> {code}
> A thread stack of such invocation (captured to show the execution flow):
> {code:java}
> "nioEventLoopGroup-5-21@9164" prio=10 tid=0x86 nid=NA runnable
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.cassandra.transport.messages.AuthResponse$1.decode(AuthResponse.java:45)
> at
> org.apache.cassandra.transport.messages.AuthResponse$1.decode(AuthResponse.java:39)
> at
> org.apache.cassandra.transport.Message$Decoder.decodeMessage(Message.java:432)
> at
> org.apache.cassandra.transport.Message$Decoder$RequestDecoder.decode(Message.java:467)
> at
> org.apache.cassandra.transport.Message$Decoder$RequestDecoder.decode(Message.java:459)
> at
> org.apache.cassandra.transport.CQLMessageHandler.processRequest(CQLMessageHandler.java:377)
> at
> org.apache.cassandra.transport.CQLMessageHandler$LargeMessage.onComplete(CQLMessageHandler.java:755)
> at
> org.apache.cassandra.net.AbstractMessageHandler$LargeMessage.supply(AbstractMessageHandler.java:561)
> at
> org.apache.cassandra.net.AbstractMessageHandler.processSubsequentFrameOfLargeMessage(AbstractMessageHandler.java:257)
> at
> org.apache.cassandra.net.AbstractMessageHandler.processIntactFrame(AbstractMessageHandler.java:229)
> at
> org.apache.cassandra.net.AbstractMessageHandler.process(AbstractMessageHandler.java:216)
> at
> org.apache.cassandra.transport.CQLMessageHandler.process(CQLMessageHandler.java:147)
> at
> org.apache.cassandra.net.FrameDecoder.deliver(FrameDecoder.java:330)
> at
> org.apache.cassandra.net.FrameDecoder.channelRead(FrameDecoder.java:294)
> at
> org.apache.cassandra.net.FrameDecoder.channelRead(FrameDecoder.java:277)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:829)
> {code}
> The provided MR (https://github.com/apache/cassandra/pull/3655) contains a
> fix for the issue which introduces 2 new parameters:
> native_transport_max_message_size - to limit any CQL message size
> native_transport_max_auth_message_size (default = 128KiB) - to limit auth
> response message size more strictly and add an extra protection against a
> possible DoS attack.
> Design questions:
> * The current implementation closes a CQL connection if a message is bigger
> than the limits. A skip message body logic can be implemented to continue the
> connection usage but it is more complicated and error prone.
> * The tricky question is the default value for
> native_transport_max_message_size,
> from one side - we want to have it not more than
> min(native_transport_max_request_data_in_flight_per_ip,
> native_transport_max_request_data_in_flight) to reduce chances to invoke the
> branch of logic when a error handling does not work
> from another size - min(native_transport_max_request_data_in_flight_per_ip,
> native_transport_max_request_data_in_flight) can be too small and there is a
> chance to break a backward compatibility for existing deployments where
> people use large messages and small heaps (while it is not a good idea).
> Related observations:
> 1) https://issues.apache.org/jira/browse/CASSANDRA-16886 - Reduce
> native_transport_max_frame_size_in_mb (from 256M to 16M)
> 2) A correspondent logic for Cassandra server internode protocol a message
> limit exists and rate limiting parameters are validated to be smaller than a
> single message max size:
> internode_max_message_size =
> min(internode_application_receive_queue_reserve_endpoint_capacity,
> internode_application_send_queue_reserve_endpoint_capacity)
> internode_application_receive_queue_reserve_endpoint_capacity = 128MiB
> internode_application_send_queue_reserve_endpoint_capacity = 128MiB
> internode_max_message_size <=
> internode_application_receive_queue_reserve_endpoint_capacity
> internode_max_message_size <=
> internode_application_receive_queue_reserve_global_capacity
> internode_max_message_size <=
> internode_application_send_queue_reserve_endpoint_capacity
> internode_max_message_size <=
> internode_application_send_queue_reserve_global_capacity
> 3) Request types according to CQL specification:
> 4.1.1. STARTUP, in normal cases should be small
> 4.1.2. AUTH_RESPONSE, in normal cases should be small
> 4.1.3. OPTIONS, in normal cases should be small
> 4.1.4. QUERY, in normal cases should be small
> 4.1.5. PREPARE, in normal cases should be small
> 4.1.6. EXECUTE <-- potentially large in case of inserts, max_mutation_size =
> commitlog_segment_size / 2; where commitlog_segment_size_in_mb = 32MiB
> 4.1.7. BATCH <-- potentially large, max_mutation_size =
> commitlog_segment_size / 2; where commitlog_segment_size_in_mb = 32MiB
> 4.1.8. REGISTER, in normal cases should be small
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]