The create time can be improved.  I think the issue is it forces Avro to create 
a lot more objects.

Others can take a string, and then directly encode it to the resulting byte 
output.   We have to take a string, encode it into a byte[] (in a Utf8) then 
copy that to the output, and then throw away the Utf8.   We could recycle the 
byte[] buffers from Utf8's  (say, a thread-local byte[] buffer cache like what 
Jackson does), or allow Strings to write and read directly from the decoder 
along side Utf8's.  Our challenge will be that we must encode the length of the 
string before encoding, and that is not available until it has been converted 
to Utf8.

Because of the way the test is partitioned, some of our serialize time ended up 
in the create time -- others do the UTF16 >> UTF8 conversion while serializing, 
we do it in the 'create' phase.

Furthermore on the Java side I think there is a lot of room for further 
improvement on the raw serialization and deserialization, but not much of it is 
easy and most of it has to do with more complicated schemas. 

The benchmark setup is suspect -- last I checked it used an inappropriate heap 
size and the code comments around its 'warmup' process were misguided.

-Scott

On Apr 22, 2010, at 8:51 AM, Doug Cutting wrote:

> Avro seems to be sliding a bit in this benchmark.  The poor "create" 
> time has always been a problem for Avro, although I'm not sure why. 
> This isn't a great benchmark, but lots of folks look at it, so it'd be 
> nice if we did well there.
> 
> Doug
> 
> -------- Original Message --------
> Subject: New benchmarking page.
> Date: Thu, 22 Apr 2010 04:34:04 -0700
> From: Kannan Goundan <kan...@cakoose.com>
> Reply-To: java-serialization-benchmark...@googlegroups.com
> To: java-serialization-benchmark...@googlegroups.com
> 
> I've created a "version 2" of the Benchmarking page.
> 
>    http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
> 
> These measurements were generated using the new code I've been adding
> over the past month or so.  One advantage of the new code is that I've
> actually tried to make the various serializers do the same amount of
> work (previously, many serializers were specialized to the exact data
> value being tested).

Reply via email to