[
https://issues.apache.org/jira/browse/IO-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312870#comment-14312870
]
Thomas Neidhart commented on IO-468:
------------------------------------
Some benchmarks I did with my own test harness. The numbers are the actual
number of executions of the test code within 100ms, averaged over a total of 10
runs.
Copying an ByteArrayInputStream:
{noformat}
Stream Length=100
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 493703 34613
copyLarge(i, o, arr) 12205585 234018
copyLarge(i, o, tl.get()) 10590205 206625
Diff tl/arr 0.87x
Diff tl/plain 21.45x
=====================================================
Stream Length=1000
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 502246 11686
copyLarge(i, o, arr) 5553711 159619
copyLarge(i, o, tl.get()) 4880272 232972
Diff tl/arr 0.88x
Diff tl/plain 9.72x
=====================================================
Stream Length=10000
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 317060 4488
copyLarge(i, o, arr) 253169 12052
copyLarge(i, o, tl.get()) 522264 12864
Diff tl/arr 2.06x
Diff tl/plain 1.65x
=====================================================
Stream Length=100000
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 47718 392
copyLarge(i, o, arr) 52298 447
copyLarge(i, o, tl.get()) 51703 907
Diff tl/arr 0.99x
Diff tl/plain 1.08x
=====================================================
Stream Length=1000000
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 4396 310
copyLarge(i, o, arr) 4483 420
copyLarge(i, o, tl.get()) 4646 87
Diff tl/arr 1.04x
Diff tl/plain 1.06x
=====================================================
{noformat}
Reading a 3MB large file into memory:
{noformat}
=====================================================
Method mean stdev
-----------------------------------------------------
copy(i, o) 238 4
copyLarge(i, o, arr) 248 4
copyLarge(i, o, tl.get()) 250 4
Diff tl/arr 1.01x
Diff tl/plain 1.05x
=====================================================
{noformat}
It is obvious that the performance depends whether the stream copying is
IO-bound or not.
Even though I did take care of warm-up runs, the noise during the execution can
affect performance quite a lot as you can see from the standard deviation and
the fact that sometimes the ThreadLocal verions is faster, sometimes the array
version. So I would not trust my own benchmark too much in this regard but I
just wanted to quickly disprove your benchmark.
The reason you see such amazing speedups is simply because you do not copy the
streams correctly:
{code}
@Override
public void run()
{
for(int i = 0; i < runs; i++){
try {
IOUtils.copy(inputStream, outputStream);
} catch (IOException e) {
System.err.println(e.getMessage());
}
}
}
{code}
You just call copy again and again on the same streams, but not resetting or
re-initializing them properly again. This basically means that after the first
copy, all subsequent calls immediately return as the input stream is already
exhausted. So the test results are just wrong.
> Avoid allocating memory for method internal buffers, use threadlocal memory
> instead
> -----------------------------------------------------------------------------------
>
> Key: IO-468
> URL: https://issues.apache.org/jira/browse/IO-468
> Project: Commons IO
> Issue Type: Improvement
> Components: Utilities
> Affects Versions: 2.4
> Environment: all environments
> Reporter: Bernd Hopp
> Priority: Minor
> Labels: newbie, performance
> Fix For: 2.5
>
> Attachments: PerfTest.java, monitoring_with_threadlocals.png,
> monitoring_without_threadlocals.png, performancetest.ods,
> performancetest_weakreference.ods
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> In a lot of places, we allocate new buffers dynamically via new byte[]. This
> is a performance drawback since many of these allocations could be avoided if
> we would use threadlocal buffers that can be reused. For example, consider
> the following code from IOUtils.java, ln 2177:
> return copyLarge(input, output, inputOffset, length, new
> byte[DEFAULT_BUFFER_SIZE]);
> This code allocates new memory for every copy-process, that is not used
> outside of the method and could easily and safely reused, as long as is is
> thread-local. So instead of allocating new memory, a new utility-class could
> provide a thread-local bytearray like this:
> byte[] buffer = ThreadLocalByteArray.ofSize(DEFAULT_BUFFER_SIZE);
> return copyLarge(input, output, inputOffset, length, buffer);
> I have not measured the performance-benefits yet, but I would expect them to
> be significant, especially when the streams itself are not the performance
> bottleneck.
> Git PR is at https://github.com/apache/commons-io/pull/6/files
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)