[jira] Commented: (AVRO-315) Performance improvements to BinaryDecoder

Scott Carey (JIRA) Wed, 13 Jan 2010 17:23:18 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800068#action_12800068
 ]


Scott Carey commented on AVRO-315:
----------------------------------

I could not reproduce the performance improvement from unrolling the loop.  I 
was able to modify the loop to remove a conditional which had a small impact 
and it is ugly:

{code}
int n = 0;
    int b;
    int shift = 0;
    do {
      b = in.read();
      n |= (b & 0x7f) << shift;
      shift += 7;
    } while (((b ^ 0xFFFFFF00) & 0xFFFFFF80) == 0xFFFFFF80);
    
    if (-1 == b) {
      throw new EOFException();
    } 
    return n;{code}

I have modified the test to pick a broader set of integers to serialize, 
increased its duration, and increased its warmup time.  It is nearly stable 
now, the JVM is usually done compiling before the test starts.  I run the 
Integer and Long test twice to help guage the stability.

OS X Snow Leopard, 1.6.0_15, -server -XX:+UseCompressedOops.

Reviewing the code, the primary problem looks to be the ubiquitous use of this 
slow function:

{code}
InputStream.read();
{code}

Unfortunately, this old stream API is slow.  It is optimized for bulk transfer 
and very poor for reading small chunks at once.  Ideally one should always use 
the bulk copy operators on InputStream and its relations if possible.

I made a quick and dirty stab at removing that problem by adding a byte[] 
buffer in this class and buffering access to the input stream, allowing array 
access to retrieve bytes.

The result is about 2x to 3x as fast!

I'm attaching a patch that has a test class (modified from Thiru's) and a 
prototype "FastBinaryDecoder.java".  There is room for improvement on top of 
this.   This is a prototype to share -- needs some work to clean it up and 
merge it into BinaryDecoder.  Plus, it is easier to test one versus the other 
when they are in different classes.  The new test class has only one line to 
change to switch implementations.

The idea is to use array accessors as often as possible, and reduce the number 
of times that bounds have to be checked.  There are ways we could reduce this 
even further.
The nio ByteBuffer API is smarter about this sort of thing than InputStream, 
but raw byte[] access tends to be the fastest, even if it means making an extra 
copy.  An extra 2k copy is almost free (fits in L1 cache) -- two or three 
orders of magnitude faster than a large in-process memcopy or copy from out of 
process. 

Sample test output below:

{noformat}
BinaryDecoder (moodified, including Thiru's changes but without the loop 
unrolling):
-----
ReadInt({"type":"array","items":"int"}): 945 ms, 31.731626594910882 million 
numbers decoded /sec
ReadLong({"type":"array","items":"long"}): 1434 ms, 20.92050209205021 million 
numbers decoded /sec
ReadInt({"type":"array","items":"int"}): 831 ms, 36.08644869649733 million 
numbers decoded /sec
ReadLong({"type":"array","items":"long"}): 1430 ms, 20.972303975370128 million 
numbers decoded /sec
ReadFloat({"type":"array","items":"float"}): 636 ms, 47.09687041296106 million 
numbers decoded /sec
ReadDouble({"type":"array","items":"double"}): 772 ms, 38.81877068716988 
million numbers decoded /sec

FastBinaryDecoder:
--------
ReadInt({"type":"array","items":"int"}): 298 ms, 100.49746243907342 million 
numbers decoded /sec
ReadLong({"type":"array","items":"long"}): 793 ms, 37.79032416540069 million 
numbers decoded /sec
ReadInt({"type":"array","items":"int"}): 291 ms, 102.76084126875385 million 
numbers decoded /sec
ReadLong({"type":"array","items":"long"}): 806 ms, 37.2056106060794 million 
numbers decoded /sec
ReadFloat({"type":"array","items":"float"}): 324 ms, 92.45904064499426 million 
numbers decoded /sec
ReadDouble({"type":"array","items":"double"}): 345 ms, 86.7109663359125 million 
numbers decoded /sec{noformat}


> Performance improvements to BinaryDecoder
> -----------------------------------------
>
>                 Key: AVRO-315
>                 URL: https://issues.apache.org/jira/browse/AVRO-315
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Thiruvalluvan M. G.
>            Assignee: Thiruvalluvan M. G.
>         Attachments: AVRO-315-test.patch, AVRO-315.patch, AVRO-315.patch
>
>
> The forthcoming patch improves the performance of BinaryDecoder.readLong(), 
> readFloat() and readDouble().
> The test-patch has a command-line program Perf in org.apache.avro.io in the 
> (test part of the source directory) which tests the performance of readInt() 
> (which calls readLong())  readFloat() and readDouble(). On my machine, the 
> patch improves the performance by 10% for readInt() and about 50% for 
> readFloat() and readDouble().
> The idea is to unroll the loops in readLong(), readFloat() and readDouble(). 
> There is a small change in doReadBytes() which checks for most common 
> condition before less common ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AVRO-315) Performance improvements to BinaryDecoder

Reply via email to