I'm getting kind of sick of saying "turn off chunking" so I've been
experimenting with the benchmarks to see what we can do and also
get a
feel for what we lose/gain with it.
First thing I've learned: you REALLY want to use the ParallelGC stuff
on multi-core systems. Huge boost with that on. (I wonder if I
can
get the unit tests/maven using it..... Hmm.....)
Basically, I tested various messages sizes in three scenarios:
1) CPU bound - lots of threads sending requests so the CPU is pegged.
(lots of memory -Xmx1024m)
2) Memory bound - only a couple threads, but a low -Mx setting (I
used
64M)
3) Not bound - 2 threads (dual core client machine and dual core
server machine)
There are two important things to measure:
1) Total requests per second
2) Latencies
Basically, by using chunking, "chunks" of the request can be sent to
the server and the server can start processing them while the client
produces more chunks. Thus, the server can start doing the JAXB
deserializing of the first parts of the message while the client is
still using JAXB to write the last part. The big benefit to this is
latencies. The server already has deserialized most of the data by
the time the client is done sending it.
For the unbound case (case 3):
For SMALL messages (< 2K or so) turning off chunking doesn't seem to
have any adverse affects. Actually, on higher latency connections
(11mbit wireless compared to gigabit), it can actually help a bit as
chunking tends to send an extra network packet.
However, once it gets above 8K or so, chunking starts to really help.
Once it gets up to about 24K, the difference is pretty big. The
latencies are much lower so the unbound clients can send more
requests. Nearly 30% higher TPS. If your benchmark is few threads
pounding on the server, you really want the chunking turned on.
Case 2 gets similar results. Because the HTTPUrlConnection needs
to
buffer the full request in the unchunked case, it puts a big load on
the heap and the garbage collector. (again, parallelgc helps) For
small messages, the two are comparable. However, as the message
grows, the chunking helps keep the heap in better shape and puts less
strain on the gc. At some point, with chunking on, the messages
work
and with chunking off, we get OutOfMemoryErrors. (I had messages
around 10M at that point) The chunking still was working all the
way
up to 50M.
In case 1 where it's CPU bound, chunking or no chunking had very
little affect. The chunking allows the server to process things
ahead of time, but that only really works well if the client/server
has cpu cycles to process it. Actually, the chunking takes a
little
more cpu work to decode so the non-chunked case is very slightly
faster (barely measurable, like 1-2%).
So, where does this leave us? I'm not sure. We COULD add a
"chunkingThreashold" parameter to the http conduit client parameters,
defaulted to something like 4K. Buffer up to that amount and if the
request completes (stream.close() called) before it's full, set the
content length and go non-chunked. Once it goes over, go chunking.
That, would allow small messages to work with the older services.
The question is: will that help or make things worse? Would we get
support requests like "can CXF not handle big messages?" or similar
when it works for the small requests, but suddenly stops working for
the larger requests?
Anyway, anyone else have some thoughts?
---
Daniel Kulp
[EMAIL PROTECTED]
http://www.dankulp.com/blog