[jira] [Commented] (IO-468) Avoid allocating memory for method internal buffers, use threadlocal memory instead

Thomas Neidhart (JIRA) Mon, 09 Feb 2015 12:58:56 -0800

    [ 
https://issues.apache.org/jira/browse/IO-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312870#comment-14312870
 ]


Thomas Neidhart commented on IO-468:
------------------------------------

Some benchmarks I did with my own test harness. The numbers are the actual 
number of executions of the test code within 100ms, averaged over a total of 10 
runs.

Copying an ByteArrayInputStream:

{noformat}
Stream Length=100
=====================================================
Method                          mean            stdev
-----------------------------------------------------
copy(i, o)                      493703          34613
copyLarge(i, o, arr)            12205585        234018
copyLarge(i, o, tl.get())       10590205        206625
Diff tl/arr     0.87x
Diff tl/plain   21.45x
=====================================================

Stream Length=1000
=====================================================
Method                          mean            stdev
-----------------------------------------------------
copy(i, o)                      502246          11686
copyLarge(i, o, arr)            5553711        159619
copyLarge(i, o, tl.get())       4880272        232972
Diff tl/arr     0.88x
Diff tl/plain   9.72x
=====================================================

Stream Length=10000
=====================================================
Method                          mean            stdev
-----------------------------------------------------
copy(i, o)                      317060          4488
copyLarge(i, o, arr)            253169         12052
copyLarge(i, o, tl.get())       522264         12864
Diff tl/arr     2.06x
Diff tl/plain   1.65x
=====================================================

Stream Length=100000
=====================================================
Method                          mean            stdev
-----------------------------------------------------
copy(i, o)                      47718           392
copyLarge(i, o, arr)            52298           447
copyLarge(i, o, tl.get())       51703           907
Diff tl/arr     0.99x
Diff tl/plain   1.08x
=====================================================

Stream Length=1000000
=====================================================
Method                          mean            stdev
-----------------------------------------------------
copy(i, o)                      4396            310
copyLarge(i, o, arr)            4483            420
copyLarge(i, o, tl.get())       4646             87
Diff tl/arr     1.04x
Diff tl/plain   1.06x
=====================================================
{noformat}

Reading a 3MB large file into memory:
{noformat}
=====================================================
Method                          mean    stdev
-----------------------------------------------------
copy(i, o)                      238             4
copyLarge(i, o, arr)            248             4
copyLarge(i, o, tl.get())       250             4
Diff tl/arr     1.01x
Diff tl/plain   1.05x
=====================================================
{noformat}

It is obvious that the performance depends whether the stream copying is 
IO-bound or not.
Even though I did take care of warm-up runs, the noise during the execution can 
affect performance quite a lot as you can see from the standard deviation and 
the fact that sometimes the ThreadLocal verions is faster, sometimes the array 
version. So I would not trust my own benchmark too much in this regard but I 
just wanted to quickly disprove your benchmark.

The reason you see such amazing speedups is simply because you do not copy the 
streams correctly:

{code}
        @Override
        public void run()
        {
            for(int i = 0; i < runs; i++){
                try {
                    IOUtils.copy(inputStream, outputStream);
                } catch (IOException e) {
                    System.err.println(e.getMessage());
                }
            }
        }
{code}

You just call copy again and again on the same streams, but not resetting or 
re-initializing them properly again. This basically means that after the first 
copy, all subsequent calls immediately return as the input stream is already 
exhausted. So the test results are just wrong.

> Avoid allocating memory for method internal buffers, use threadlocal memory 
> instead
> -----------------------------------------------------------------------------------
>
>                 Key: IO-468
>                 URL: https://issues.apache.org/jira/browse/IO-468
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 2.4
>         Environment: all environments
>            Reporter: Bernd Hopp
>            Priority: Minor
>              Labels: newbie, performance
>             Fix For: 2.5
>
>         Attachments: PerfTest.java, monitoring_with_threadlocals.png, 
> monitoring_without_threadlocals.png, performancetest.ods, 
> performancetest_weakreference.ods
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> In a lot of places, we allocate new buffers dynamically via new byte[]. This 
> is a performance drawback since many of these allocations could be avoided if 
> we would use threadlocal buffers that can be reused. For example, consider 
> the following code from IOUtils.java, ln 2177:
> return copyLarge(input, output, inputOffset, length, new 
> byte[DEFAULT_BUFFER_SIZE]);
> This code allocates new memory for every copy-process, that is not used 
> outside of the method and could easily and safely reused, as long as is is 
> thread-local. So instead of allocating new memory, a new utility-class could 
> provide a thread-local bytearray like this:
> byte[] buffer = ThreadLocalByteArray.ofSize(DEFAULT_BUFFER_SIZE);
> return copyLarge(input, output, inputOffset, length, buffer);
> I have not measured the performance-benefits yet, but I would expect them to 
> be significant, especially when the streams itself are not the performance 
> bottleneck. 
> Git PR is at https://github.com/apache/commons-io/pull/6/files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (IO-468) Avoid allocating memory for method internal buffers, use threadlocal memory instead

Reply via email to