I'm getting kind of sick of saying "turn off chunking" so I've been experimenting with the benchmarks to see what we can do and also get a feel for what we lose/gain with it.

First thing I've learned: you REALLY want to use the ParallelGC stuff on multi-core systems. Huge boost with that on. (I wonder if I can get the unit tests/maven using it..... Hmm.....)

Basically, I tested various messages sizes in three scenarios:
1) CPU bound - lots of threads sending requests so the CPU is pegged. (lots of memory -Xmx1024m) 2) Memory bound - only a couple threads, but a low -Mx setting (I used 64M) 3) Not bound - 2 threads (dual core client machine and dual core server machine)

There are two important things to measure:
1) Total requests per second
2) Latencies


Basically, by using chunking, "chunks" of the request can be sent to the server and the server can start processing them while the client produces more chunks. Thus, the server can start doing the JAXB deserializing of the first parts of the message while the client is still using JAXB to write the last part. The big benefit to this is latencies. The server already has deserialized most of the data by the time the client is done sending it.


For the unbound case (case 3):

For SMALL messages (< 2K or so) turning off chunking doesn't seem to have any adverse affects. Actually, on higher latency connections (11mbit wireless compared to gigabit), it can actually help a bit as chunking tends to send an extra network packet.

However, once it gets above 8K or so, chunking starts to really help.

Once it gets up to about 24K, the difference is pretty big. The latencies are much lower so the unbound clients can send more requests. Nearly 30% higher TPS. If your benchmark is few threads pounding on the server, you really want the chunking turned on.


Case 2 gets similar results. Because the HTTPUrlConnection needs to buffer the full request in the unchunked case, it puts a big load on the heap and the garbage collector. (again, parallelgc helps) For small messages, the two are comparable. However, as the message grows, the chunking helps keep the heap in better shape and puts less strain on the gc. At some point, with chunking on, the messages work and with chunking off, we get OutOfMemoryErrors. (I had messages around 10M at that point) The chunking still was working all the way up to 50M.


In case 1 where it's CPU bound, chunking or no chunking had very little affect. The chunking allows the server to process things ahead of time, but that only really works well if the client/server has cpu cycles to process it. Actually, the chunking takes a little more cpu work to decode so the non-chunked case is very slightly faster (barely measurable, like 1-2%).


So, where does this leave us? I'm not sure. We COULD add a "chunkingThreashold" parameter to the http conduit client parameters, defaulted to something like 4K. Buffer up to that amount and if the request completes (stream.close() called) before it's full, set the content length and go non-chunked. Once it goes over, go chunking. That, would allow small messages to work with the older services. The question is: will that help or make things worse? Would we get support requests like "can CXF not handle big messages?" or similar when it works for the small requests, but suddenly stops working for the larger requests?

Anyway, anyone else have some thoughts?


---
Daniel Kulp
[EMAIL PROTECTED]
http://www.dankulp.com/blog




Reply via email to