[ 
https://issues.apache.org/jira/browse/IO-649?focusedWorklogId=377699&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377699
 ]

ASF GitHub Bot logged work on IO-649:
-------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jan/20 15:28
            Start Date: 27/Jan/20 15:28
    Worklog Time Spent: 10m 
      Work Description: brettlounsbury commented on issue #101: IO-649 - 
Improve the performance of the contentEquals() methods.
URL: https://github.com/apache/commons-io/pull/101#issuecomment-578802448
 
 
   @garydgregory,
   
   I really appreciate your thorough review of my code.  I think at this point 
I have addressed your comments, but if you have any others please let me know.
   
   I will add that I made the IOUtils.DEFAULT_BUFFER_SIZE constant package 
private (default visibility) so it could be referenced from IOUtils tests so 
that I can ensure the logic makes sense around buffer boundaries and I didn't 
want to clone that property since it could later be mutated and invalidate some 
of the edge condition testing being done.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 377699)
    Time Spent: 4h 40m  (was: 4.5h)

> IOUtils contentEquals method performance improvements
> -----------------------------------------------------
>
>                 Key: IO-649
>                 URL: https://issues.apache.org/jira/browse/IO-649
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.0, 1.1
>            Reporter: Brett Lounsbury
>            Priority: Major
>             Fix For: 2.6
>
>          Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
>  
> contentEquals() internally wraps any given InputStream/Reader in a Buffered 
> version (if it is not already buffered) which avoids a lot of IO penalties, 
> but then it proceeds to read each byte/character one at a time.  This leads 
> to significantly more method calls and also a lot of byte -> int casting 
> since the read() method returns an int between 0 and 255 instead of returning 
> a byte.
>  
> I have a change that modifies the contentEquals() methods to internally 
> buffer content into a byte/char array and to then do batch comparisons of 
> those arrays using Arrays.equals instead of using a BufferedInputStream or 
> BufferedReader and making use of the single byte/char read() methods.  This 
> reduces the number of method invocations by a factor equal to the buffer size 
> and avoids casting every byte read to an int.
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB InputStream of binary data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 7236 to 858ms (8.43x speedup)
> P50: 7224 to 856ms (8.44x speedup)
> P90: 7249 to 860ms (8.43x speedup)
> P99: 7410 to 913ms (8.12x speedup)
> P100: 8330 to 1278ms (6.52x speedup)
>  
> The following table shows the performance increase over 1000 iterations of 
> comparing 2 1GB Reader of character data (stored in memory to avoid I/O). 
> This test was performed on an EC2 M4.4XL host using Java 1.8.0.232 and there 
> was a forced System.gc() between each iteration to avoid GC as a source of 
> latency:
> Average: 11281 to 1737ms (6.50x speedup)
> P50: 11262 to 1735ms (6.49x speedup)
> P90: 11292 to 1741ms (6.49x speedup)
> P99: 11707 to 1774ms (6.60x speedup)
> P100: 12176 to 1884ms (6.46x speedup)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to