@Jay,
My bad. I mistook the batch.size to be number of messages instead of bytes.
Below are revised measurements based on computing the batch.size in bytes .
@Jun,
With explicit flush()... linger should not impact. Isn't it ?
@Wang,
Larger batches are not necessarily giving better numbers are you can see
below.
The 2 problems I noted earlier still exist in the batched sync mode (using
flush() ).
* batch.size still seems to play a factor even when set to a larger value
than the bytes generated by client
* 4 & 8 partition see a big slowdown
Revised measurements for new Producer API:
- All cases...Single threaded, 1k event size
Batched SYNC using flus() , acks=1
1 partition
Batch=4k Batch=8k Batch=16k
batch.size == clientBatch 140
124
batch.size = 10MB 140 123 124
batch.Size = 20MB 31 30 42
4 partitions
Batch=4k Batch=8k Batch=16k
batch.size == clientBatch 60 8 6
batch.size = 10M 7 7 7
batch.Size = 20M 6 6 5
8 partitions
Batch=4k Batch=8k Batch=16k
batch.size == clientBatch 7 8 8
batch.size = 10M 7 8 7
batch.Size = 20M 6 6 6
Just for reference I also took the number for default ASYNC mode with acks=1 :
batch.size=deafult batch.size=4MB batch.size=8MB batch.size=16MB
1 partition 53 130 113 76
4 partitions 84 126 9 7
8 partitions 9 12 10 5