On Sat, 15 Mar 2014, Bob Friesenhahn wrote:
I am still struggling to get GraphicsMagick running properly fast on an
Illumos system (in this case OpenIndiana oi_151a9).
Previously, GraphicsMagick was entirely profiled and tuned on a 4-core AMD
system running Solaris 10. It still runs well on that system.
The OpenIndiana system has 16-cores (32 threads with hyper-threading).
GraphicsMagick usually runs 2X faster on a Linux system with prior generation
Intel CPUs with 12-cores (a system which should be 1/2 as fast). With the
AMD Solaris 10 system and the modern Linux system, I see expected speedups
from adding threads but not on the OpenIndiana system.
I should clarify the above. The problematic situation is the case
where the software should be doing very little actual work. It
allocates a large buffer (e.g. 200MB) using libumem's 'malloc()' for
the data and then reads data from a file using fread(), doing a small
amount of processing as it transfers data linearly from the file to
memory. The input data is 1/2 the size of the allocated memory.
Then the memory is released and the program terminates. The reason
why this case is important is that this represents the baseline cost
to do anything further and the baseline cost is 2X more on Illumos
than Linux.
If actual data processing takes place (i.e. CPU processing becomes the
bottleneck than I/O and initial memory allocation) then the
performance numbers do reflect the difference in underlying hardware
performance and all seems good.
The Linux VM system works rather differently than Illumos since Linux
VM relies on over-commit and Solaris does not. Perhaps Linux is much
faster to add memory to a process than Solaris is.
If the memory allocation under Linux is reduced by a factor of 2
(memory size is the same as input data size), then the run-time
decreases by a factor of 2 whereas with Illumos, the run-time is only
slightly diminished. In fact, with the decreased memory use, the
difference is more stark (e.g. Illumos 0.75s, Linux 0.26s).
One might think that the problem is with Illumos stdio but if the
data is mmapped with a zero-copy approach, Illumos still exhibits
similar balkyness but with somewhat more performance.
Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com