[ 
https://issues.apache.org/jira/browse/IO-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311285#comment-14311285
 ] 

Bernd Hopp commented on IO-468:
-------------------------------

testing was more difficult than I expected, but heres some results regarding 
the sheer performance, memory allocation- and multithreading-tests are in the 
making. 
The results, if they are correct, are spectacular. Average performance with 
threadlocals is about 15 times better. In one test, performance was 86 times 
better then without threadlocal. This was propably caused by a garbage 
collector run, but the overall performance increase is nonetheless impressive. 
There are 3 tests where performance decreased, in run 1 with stream-sizes 
33554432, 67108864 and 134217728. This did not happen in run 2 and 3, so I 
guess these results to be statistical outliers. Generally, performance 
differences decrease with increasing streamsizes, which is not suprising given 
that with larger streams, the performance cost of buffer allocation becomes 
less impactful.

Here's how to reproduce the tests: 

1. copy the attached PerfTest.java to your home directory
2. open console
3. git clone https://github.com/berndhopp/commons-io.git
4. cd commons-io
5. git checkout origin/introduce_threadlocal_buffers_to_avoid_memory_allocation
6. cp ~/PerfTest src/main/java/org/apache/commons/io/
7. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
with_threadlocal_1"
8. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
with_threadlocal_2"
9. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
with_threadlocal_3"
10. git checkout origin/trunk
11. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
without_threadlocal_1"
12. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
without_threadlocal_2"
13. mvn clean compile exec:java 
-Dexec.mainClass="org.apache.commons.io.PerfTest" -Dexec.args="28 4096 
without_threadlocal_3"
14. open all .csv files in commons-io and the attached performancetest.ods in 
your favourite office suite
15. merge the files,
     - copy cells B2 to B30 from with_threadlocal_1.csv to sheet 'run 1' in 
performancetest.ods, cells B2 to B30
     - copy cells B2 to B30 from without_threadlocal_1.csv to sheet 'run 1' in 
performancetest.ods, cells C2 to C30
     - copy cells B2 to B30 from with_threadlocal_2.csv to sheet 'run 2' in 
performancetest.ods, cells B2 to B30
     - copy cells B2 to B30 from without_threadlocal_2.csv to sheet 'run 2' in 
performancetest.ods, cells C2 to C30
     - copy cells B2 to B30 from with_threadlocal_3.csv to sheet 'run 3' in 
performancetest.ods, cells B2 to B30
     - copy cells B2 to B30 from without_threadlocal_3.csv to sheet 'run 3' in 
performancetest.ods, cells C2 to C30

16. let me know if you can reproduce the results.



> Avoid allocating memory for method internal buffers, use threadlocal memory 
> instead
> -----------------------------------------------------------------------------------
>
>                 Key: IO-468
>                 URL: https://issues.apache.org/jira/browse/IO-468
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 2.4
>         Environment: all environments
>            Reporter: Bernd Hopp
>            Priority: Minor
>              Labels: newbie, performance
>             Fix For: 2.5
>
>         Attachments: PerfTest.java, performancetest.ods
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> In a lot of places, we allocate new buffers dynamically via new byte[]. This 
> is a performance drawback since many of these allocations could be avoided if 
> we would use threadlocal buffers that can be reused. For example, consider 
> the following code from IOUtils.java, ln 2177:
> return copyLarge(input, output, inputOffset, length, new 
> byte[DEFAULT_BUFFER_SIZE]);
> This code allocates new memory for every copy-process, that is not used 
> outside of the method and could easily and safely reused, as long as is is 
> thread-local. So instead of allocating new memory, a new utility-class could 
> provide a thread-local bytearray like this:
> byte[] buffer = ThreadLocalByteArray.ofSize(DEFAULT_BUFFER_SIZE);
> return copyLarge(input, output, inputOffset, length, buffer);
> I have not measured the performance-benefits yet, but I would expect them to 
> be significant, especially when the streams itself are not the performance 
> bottleneck. 
> Git PR is at https://github.com/apache/commons-io/pull/6/files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to