On Sat, 15 Mar 2014, Bob Friesenhahn wrote:

I am still struggling to get GraphicsMagick running properly fast on an Illumos system (in this case OpenIndiana oi_151a9).

Previously, GraphicsMagick was entirely profiled and tuned on a 4-core AMD system running Solaris 10. It still runs well on that system.

The OpenIndiana system has 16-cores (32 threads with hyper-threading).

GraphicsMagick usually runs 2X faster on a Linux system with prior generation Intel CPUs with 12-cores (a system which should be 1/2 as fast). With the AMD Solaris 10 system and the modern Linux system, I see expected speedups from adding threads but not on the OpenIndiana system.

I should clarify the above. The problematic situation is the case where the software should be doing very little actual work. It allocates a large buffer (e.g. 200MB) using libumem's 'malloc()' for the data and then reads data from a file using fread(), doing a small amount of processing as it transfers data linearly from the file to memory. The input data is 1/2 the size of the allocated memory. Then the memory is released and the program terminates. The reason why this case is important is that this represents the baseline cost to do anything further and the baseline cost is 2X more on Illumos than Linux.

If actual data processing takes place (i.e. CPU processing becomes the bottleneck than I/O and initial memory allocation) then the performance numbers do reflect the difference in underlying hardware performance and all seems good.

The Linux VM system works rather differently than Illumos since Linux VM relies on over-commit and Solaris does not. Perhaps Linux is much faster to add memory to a process than Solaris is.

If the memory allocation under Linux is reduced by a factor of 2 (memory size is the same as input data size), then the run-time decreases by a factor of 2 whereas with Illumos, the run-time is only slightly diminished. In fact, with the decreased memory use, the difference is more stark (e.g. Illumos 0.75s, Linux 0.26s).

One might think that the problem is with Illumos stdio but if the data is mmapped with a zero-copy approach, Illumos still exhibits similar balkyness but with somewhat more performance.

Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to