[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868091#comment-15868091
]
Ariel Weisberg commented on CASSANDRA-8457:
-------------------------------------------
{quote}
Thus, I propose to bring back the SwappingByteBufDataOutputStreamPlus that I
had in an earlier commit. To recap, the basic idea is provide a DataOutputPlus
that has a backing ByteBuffer that is written to, and when it is filled, it is
written to the netty context and flushed, then allocate a new buffer for more
writes - kinda similar to a BufferedOutputStream, but replacing the backing
buffer when full. Bringing this idea back is also what underpins one of the
major performance things I wanted to address: buffering up smaller messages
into one buffer to avoid going back to the netty allocator for every tiny
buffer we might need - think Mutation acks.
{quote}
What thread is going to be writing to the output stream to serialize the
messages? If it's a Netty thread you can't block inside a serialization method
waiting for the bytes to drain to the socket that is not keeping up. You also
can't wind out of the serialization method and continue it later.
If it's an application thread then it's no longer asynchronous and a slow
connection can block the application and prevent it from doing say a quorum
write to just the fast nodes. You would also need to lock during serialization
or queue concurrently sent messages behind the one currently being written.
With large messages we aren't really fully eliminating the issue only making it
a factor better. At the other end you still need to materialize a buffer
containing the message + the object graph you are going to materialize. This is
different from how things worked previously where we had a dedicated thread
that would read fixed size buffers and then materialize just the object graph
from that.
To really solve this we need to be able to avoid buffering the entire message
at both sending and receiving side. The buffering is worse because we are
allocating contagious memory and not just doubling the space impact. We could
make it incrementally better by using chains of fixed size buffers so there is
less external fragmentation and allocator overhead. That's still committing
additional memory compared to pre-8457, but at least it's being committed in a
more reasonable way.
I think the most elegant solution is to use a lightweight thread
implementation. What we will probably be boxed into doing is making the
serialization of result data and other large message portions able to yield.
This will bound the memory committed to large messages to the largest atomic
portion we have to serialize (Cell?).
Something like an output stream being able to say "shouldYield". If you
continue to write it will continue to buffer and not fail, but use memory. Then
serializers can implement a return value for serialize which indicates whether
there is more to serialize. You would check shouldYield after each Cell or some
unit of work when serializing. Most of these large things being serialized are
iterators. The trick will be that most serialization is stateless, and objects
are serialized concurrently so you can't stored the serialization state in
object being serialized safely.
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Jason Brown
> Priority: Minor
> Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)