Re: [discuss] Performance on new quad socket system

Franz Schober Tue, 26 Mar 2013 10:46:52 -0700

Hi thanks all for your input,

I want to clarify my intentions, and try give answers to
questions suggestions below.


We want to test ZFS throughput over Fibrechannel/COMSTAR
on two 32 core quadsocket systems, one target one initator.

Therefor I created a ZPOOL with 2 x 6 Disks (1GB SAS) RAIDZ2 + ZEUS Ram ZIL
on the FC target (2x Emulex LP12002 / 8 Gbit) running OmniOS stable.
The initator is another system with identical hardware and software,
they are interconnected over a zoned CISCO fabric switch (9148).

1) The local dd performance on the target is around 750 MB/s and is okfor me.


 time dd if=/dev/zero of=/jbod_a1/largefile bs=128k count=64k
 65536+0 records in 65536+0 records out
 8589934592 bytes (8,6 GB) copied, 11,0914 s, 774 MB/s

2) Then exporting the largefile/ZVOL (tested both,same perf.)
as LUN over 2 x FC 8 GB Links  and creating a zpool on it brings
in the following dd test around 189 MB/s at the initiator,
this is not ok for me.

 zpool create fcvol1 c0t600144F00E7FCC0000005151C9A50003d0

 time dd if=/dev/zero of=/fcvol1/file1 bs=128k count=64k
 65536+0 records in 65536+0 records out
 8589934592 bytes (8,6 GB) copied, 45,3851 s, 189 MB/s

3) Eliminating the zpool disks on the target side by replacing
with a RAM disk or a file in /tmp is not a good idea as i learned
through the discussion. The observation that a dd test to tmpfs
without FC is much slower then on all my other systems is still
strange to me.

Another observation I made is, that disabling hyper threading gave aperformance increase

of about 20 % with the dd in tmpfs.

My next steps would be firmware and driver updates of the LPE-12002interface cards, then eliminatingthe fabric switch and directly connecting the ports, then trying to useother tools to investigate the problem.

I would be very glad for any suggestions/help.

Thank you,
Franz Schober

Here are the lgrpinfo for deactivated and activated hyper threading asinformation:

lgroup 0 (root):
        Children: 3 4 6 8
        CPUs: 0-31
        Memory: installed 256G, allocated 14G, free 242G
        Lgroup resources: 1 2 5 7 (CPU); 1 2 5 7 (memory)
        Latency: 30
lgroup 1 (leaf):
        Children: none, Parent: 3
        CPUs: 0-7
        Memory: installed 64G, allocated 1,7G, free 62G
        Lgroup resources: 1 (CPU); 1 (memory)
        Load: 0,00166
        Latency: 10
lgroup 2 (leaf):
        Children: none, Parent: 4
        CPUs: 8-15
        Memory: installed 64G, allocated 2,7G, free 61G
        Lgroup resources: 2 (CPU); 2 (memory)
        Load: 1,53e-05
        Latency: 10
lgroup 3 (intermediate):
        Children: 1, Parent: 0
        CPUs: 0-15 24-31
        Memory: installed 192G, allocated 13G, free 179G
        Lgroup resources: 1 2 7 (CPU); 1 2 7 (memory)
        Latency: 21
lgroup 4 (intermediate):
        Children: 2, Parent: 0
        CPUs: 0-23
        Memory: installed 192G, allocated 5,1G, free 187G
        Lgroup resources: 1 2 5 (CPU); 1 2 5 (memory)
        Latency: 21
lgroup 5 (leaf):
        Children: none, Parent: 6
        CPUs: 16-23
        Memory: installed 64G, allocated 658M, free 63G
        Lgroup resources: 5 (CPU); 5 (memory)
        Load: 0,0194
        Latency: 10
lgroup 6 (intermediate):
        Children: 5, Parent: 0
        CPUs: 8-31
        Memory: installed 192G, allocated 12G, free 180G
        Lgroup resources: 2 5 7 (CPU); 2 5 7 (memory)
        Latency: 21
lgroup 7 (leaf):
        Children: none, Parent: 8
        CPUs: 24-31
        Memory: installed 64G, allocated 8,6G, free 55G
        Lgroup resources: 7 (CPU); 7 (memory)
        Load: 0,125
        Latency: 10
lgroup 8 (intermediate):
        Children: 7, Parent: 0
        CPUs: 0-7 16-31
        Memory: installed 192G, allocated 11G, free 181G
        Lgroup resources: 1 5 7 (CPU); 1 5 7 (memory)
        Latency: 21


lgroup 0 (root):
    Children: 3 4 6 8
    CPUs: 0-63
    Memory: installed 256G, allocated 27G, free 229G
    Lgroup resources: 1 2 5 7 (CPU); 1 2 5 7 (memory)
    Latency: 30
lgroup 1 (leaf):
    Children: none, Parent: 3
    CPUs: 0-7 32-39
    Memory: installed 64G, allocated 9,4G, free 55G
    Lgroup resources: 1 (CPU); 1 (memory)
    Load: 0,0306
    Latency: 10
lgroup 2 (leaf):
    Children: none, Parent: 4
    CPUs: 8-15 40-47
    Memory: installed 64G, allocated 493M, free 64G
    Lgroup resources: 2 (CPU); 2 (memory)
    Load: 0,0624
    Latency: 10
lgroup 3 (intermediate):
    Children: 1, Parent: 0
    CPUs: 0-15 24-47 56-63
    Memory: installed 192G, allocated 26G, free 166G
    Lgroup resources: 1 2 7 (CPU); 1 2 7 (memory)
    Latency: 21
lgroup 4 (intermediate):
    Children: 2, Parent: 0
    CPUs: 0-23 32-55
    Memory: installed 192G, allocated 11G, free 181G
    Lgroup resources: 1 2 5 (CPU); 1 2 5 (memory)
    Latency: 21
lgroup 5 (leaf):
    Children: none, Parent: 6
    CPUs: 16-23 48-55
    Memory: installed 64G, allocated 1,0G, free 63G
    Lgroup resources: 5 (CPU); 5 (memory)
    Load: 0,000946
    Latency: 10
lgroup 6 (intermediate):
    Children: 5, Parent: 0
    CPUs: 8-31 40-63
    Memory: installed 192G, allocated 17G, free 175G
    Lgroup resources: 2 5 7 (CPU); 2 5 7 (memory)
    Latency: 21
lgroup 7 (leaf):
    Children: none, Parent: 8
    CPUs: 24-31 56-63
    Memory: installed 64G, allocated 16G, free 48G
    Lgroup resources: 7 (CPU); 7 (memory)
    Load: 3,05e-05
    Latency: 10
lgroup 8 (intermediate):
    Children: 7, Parent: 0
    CPUs: 0-7 16-39 48-63
    Memory: installed 192G, allocated 26G, free 166G
    Lgroup resources: 1 5 7 (CPU); 1 5 7 (memory)
    Latency: 21



Am 26.03.13 16:11, schrieb Garrett D'Amore:

On Mar 26, 2013, at 6:44 AM, Bob Friesenhahn <[email protected]> 
wrote:

On Tue, 26 Mar 2013, Sašo Kiselkov wrote:

Once I gave it bit more thought, I realized tmpfs *should* be faster,
since it doesn't traverse the block device/SCSI interface and instead
intercepts calls pretty high up the VFS stack. Nonetheless, I suspect
the tmpfs implementation isn't really designed for multi-GB/s throughput
(it's a filesystem for /tmp FFS, it's supposed to hold a couple of kB of
data anyway).

Sašo, you are continuing to ignore that the simple dd to tmpfs test turned in 
abysmal results on this quad Xeon E5 system as compared to the many other 
systems tested.

This seems to be a problem with the system, or the way Illumos runs on it.

Not necessarily.  Higher lock contention can lead to surprising results in some 
configurations.  (Such as the speed of certain benchmarks actually *improving* 
by either offlining cores or reducing processor speeds.)  Whether that's the 
case here I don't know.  But tmpfs is the *wrong* way to benchmark memory speed.

I do have illumos on a Xeon E5 in my garage.  It works pretty well, but I've 
not spent a lot of time benchmarking or testing memory bandwidth.

        - Garrett

I had not heard of Illumos running on a quad Xeon E5 system before now.  It is 
doubtful that Illumos has been seriously tweaked/tuned for particular newer 
hardware since the days of Sun.  Most efforts seem on the level of keeping 
things running properly without the considerable resources required for 
performance tuning/testing.  Maybe there are significant issues to be resolved.

Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/22003744-9012f59c
Modify Your Subscription: https://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/24251501-1f371bde
Modify Your Subscription: https://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com



--

---------------------------------------------------------------------
Dipl.-Ing. Franz Schober
[email protected]
FirmOS Business Solutions Gmbh
Obstweg 4
A-8073 Graz-Neupirka
Tel +43 316 242322-10
Fax +43 316 242322-99
http://www.firmos.at
FN 278321x, Lg für ZRS Graz UID-Nr: ATU62657119

Dieses eMail ist vertraulich und nur für die genannten Empfänger bestimmt. 
Sollten Sie nicht der gewünschte Adressat sein, bitten wir Sie, uns umgehend zu 
informieren sowie vorliegende Nachricht zu löschen ohne vorher einen Ausdruck 
oder eine Kopie anzufertigen.

This message and any attached files are confidential and intended solely for 
the adressee(s). Any publication, transmission or other use of the information 
by a person or entity other than the intended addressee(s) is prohibited. If 
you receive this in error please contact the sender and delete the material. 
The sender does not accept liability for any errors or omissions as a result of 
the transmission.



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Re: [discuss] Performance on new quad socket system

Reply via email to