Github user franz1981 commented on the issue:
https://github.com/apache/activemq-artemis/pull/1918
@clebertsuconic Yep , don't worry I'm glad that you have taken a look on
it! Will answer inline..
> i - There's a buffer sitting on the ProtonSenderContext now that may be
unused for a while. What if you had 1000 connections at one time.. and then all
of them go inactive? I don't understand how this will be Reducing footprint. I
see quite the contrary here.
I think that optimizing for the common case would be the bigger fish, but I
don't know if what you have described is the common pattern...it is? If yes
you're right, but consider that although Netty will release it to the pool, the
memory footprint won't goes away. To solve it I will use `SoftReference` that
will react on memory pressure demands, just to be safe.
Re memory footprint, it helps due to how Netty heap pools work: just
allocating 10 bytes on it from any thread will trigger all the netty heap
arenas to be allocated: that means to have more than (using the default config)
16 MB * 16 = 250 MB of allocated heaps just sitting to perform buffer copies.
And the sad fact is that it will add a constant memory footprint to Artemis
when the broker is idle for real too.
Let me show you it...
That's on master (with the patch of Tim), after forcing a Full GC: 250 MB
of Netty arena remaining allocated

That's is with this patch (and Tim commit too): all the allocated bytes
disappear

For the graph is not visible the throughput improvement and we can just
focus just on memory: it is pretty visbile that the memory used is lower than
with pooled Netty ByteBuf too and that's because with too many threads
(producers/consumers) the Netty pool isn't able to scale and will fallback to
produce garbage.
I hope to have explained my concerns.
> ii - It is not clear to me when the buffer would be busy, since the
ProtonSenderContext represents a single consumer. Maybe it would.. but it
shouldn't. if that's happening you will be sending a large heap buffer to GC.
That's why I needed you and @tabish121 :P : if it is uncontended the logic
could be made much simpler.
From my tests (with 64 pairs of producers/consumers) seems uncontended so I
have supposed that the "busy" case is the uncommon one and just having one
byte[] created that will die very soon isn't a big deal for any GC (if is
uncommon). Re how much big, with G1 (our defualt GC) `-XX:G1HeapRegionSize` is
2 MB AFAIK and any byte[] allocation > 50% of it ie 1MB is "less optimized".
Let me re-write it with `SoftReference` and without 'AtomicReference` (it
isn't contended) and it could address all your point, thanks!!
---