Ma-ris Ruskulis wrote:
Seems now, when autoscaling is off, glusterfs running stable, at least I could not kill it with iozone.


Yes, autoscaling is still under consideration as a feature worth having.
Please avoid using it for the time being.

Thanks
Shehjar

Maris Ruskulis wrote:
Thank You, for reply! As You can see from config, ping-timeout is not set - default is asumed. Now started glusterfs with 8 threads on both server and client (autoscaling switched off).

Hardware:
*server1:*
lspci
00:00.0 Host bridge: Intel Corporation E7505 Memory Controller Hub (rev 03) 00:00.1 Class ff00: Intel Corporation E7505/E7205 Series RAS Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation E7505/E7205 PCI-to-AGP Bridge (rev 03) 00:02.0 PCI bridge: Intel Corporation E7505 Hub Interface B PCI-to-PCI Bridge (rev 03) 00:02.1 Class ff00: Intel Corporation E7505 Hub Interface B PCI-to-PCI Bridge RAS Controller (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 82)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02)
02:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
02:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
02:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04)
02:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04)
03:01.0 RAID bus controller: Intel Corporation RAID Controller
04:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
05:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
05:03.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 0d)

cat /proc/cpuinfo
processor    : 0
vendor_id    : GenuineIntel
cpu family    : 15
model        : 2
model name    : Intel(R) Xeon(TM) CPU 2.40GHz
stepping    : 5
cpu MHz        : 2392.024
cache size    : 512 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 1
apicid        : 0
initial apicid    : 0
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 2
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips    : 4784.04
clflush size    : 64
power management:

processor    : 1
vendor_id    : GenuineIntel
cpu family    : 15
model        : 2
model name    : Intel(R) Xeon(TM) CPU 2.40GHz
stepping    : 5
cpu MHz        : 2392.024
cache size    : 512 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 1
apicid        : 1
initial apicid    : 1
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 2
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips    : 4784.16
clflush size    : 64
power management:

*server2:*
lspci 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c) 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 0c) 00:01.0 System peripheral: Intel Corporation E7520 DMA Controller (rev 0c) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c) 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c)
00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 0c)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) 01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) 01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) 01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09) 02:03.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
03:01.0 I2O: LSI Logic / Symbios Logic MegaRAID (rev 01)
05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller (rev 18) 07:04.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05)
07:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)

cat /proc/cpuinfo
processor    : 0
vendor_id    : GenuineIntel
cpu family    : 15
model        : 4
model name    :                   Intel(R) Xeon(TM) CPU 2.80GHz
stepping    : 1
cpu MHz        : 2792.955
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr
bogomips    : 5590.46
clflush size    : 64
cache_alignment    : 128
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 1
vendor_id    : GenuineIntel
cpu family    : 15
model        : 4
model name    :                   Intel(R) Xeon(TM) CPU 2.80GHz
stepping    : 1
cpu MHz        : 2792.955
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 0
cpu cores    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr
bogomips    : 5586.06
clflush size    : 64
cache_alignment    : 128
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 2
vendor_id    : GenuineIntel
cpu family    : 15
model        : 4
model name    :                   Intel(R) Xeon(TM) CPU 2.80GHz
stepping    : 1
cpu MHz        : 2792.955
cache size    : 1024 KB
physical id    : 0
siblings    : 2
core id        : 0
cpu cores    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr
bogomips    : 5586.02
clflush size    : 64
cache_alignment    : 128
address sizes    : 36 bits physical, 48 bits virtual
power management:

processor    : 3
vendor_id    : GenuineIntel
cpu family    : 15
model        : 4
model name    :                   Intel(R) Xeon(TM) CPU 2.80GHz
stepping    : 1
cpu MHz        : 2792.955
cache size    : 1024 KB
physical id    : 3
siblings    : 2
core id        : 0
cpu cores    : 1
fpu        : yes
fpu_exception    : yes
cpuid level    : 5
wp        : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr
bogomips    : 5586.05
clflush size    : 64
cache_alignment    : 128
address sizes    : 36 bits physical, 48 bits virtual
power management:



jvanwanr...@chatventure.nl wrote:
Hi Maris,

Can you tell me something more about the hardware you use? With our tests yesterday we had some troubles with very high load inconjunction with autoscaling. You can try a fixed limit of threads. What are the ping-timeout settings by the way?

Best Regards Jasper

Jasper van Wanrooy - Chatventure BV
Technical Manager
T: +31 (0) 6 47 248 722
E: jvanwanr...@chatventure.nl
W: www.chatventure.nl


----- Original Message -----
From: "Maris Ruskulis" <ma...@chown.lv>
To: gluster-users@gluster.org
Sent: Friday, 29 May, 2009 10:11:45 GMT +01:00 Amsterdam / Berlin / Bern / Rome / Stockholm / Vienna
Subject: Re: [Gluster-users] Glusterfs 2.0 hangs on high load

Is there way to solve this issue?

Maris Ruskulis wrote:

    I have same issue with same config when both nodes are x64. But
    difference is that, there is no bailout messages in logs.

    Jasper van Wanrooy - Chatventure wrote:

        Hi Maris,

        I regret to hear that. I was also having problems with the
        stability on 32bit platforms. Possibly you should try it on a
        64bit platform. Is that an option?

        Best Regards Jasper


        On 28 mei 2009, at 09:36, Maris Ruskulis wrote:

            Hello!
            After upgrade to version 2.0, now using 2.0.1, I'm
            experiencing problems with glusterfs stability.
            I'm running 2 node setup with cliet side afr, and
            glusterfsd also is running on same servers. Time to time
            glusterfs just hangs, i can reproduce this running iozone
            benchmarking tool.  I'm using patched Fuse, but same
            result is with unpatched.

================================================================================
            Version      : glusterfs 2.0.1 built on May 27 2009 16:04:01
            TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b
            Starting Time: 2009-05-27 16:38:20
            Command line : /usr/sbin/glusterfsd
            --volfile=/etc/glusterfs/glusterfs-server.vol
            --pid-file=/var/run/glusterfsd.pid
            --log-file=/var/log/glusterfsd.log
            PID          : 31971
            System name  : Linux
            Nodename     : weeber.st-inst.lv
            Kernel Release : 2.6.28-hardened-r7
            Hardware Identifier: i686

            Given volfile:
+------------------------------------------------------------------------------+
            1: # file: /etc/glusterfs/glusterfs-server.vol
            2: volume posix
            3:   type storage/posix
            4:   option directory /home/export
            5: end-volume
            6:
            7: volume locks
            8:   type features/locks
            9:   option mandatory-locks on
            10:   subvolumes posix
            11: end-volume
            12:
            13: volume brick
            14:   type performance/io-threads
            15:   option autoscaling on
            16:   subvolumes locks
            17: end-volume
            18:
            19: volume server
            20:   type protocol/server
            21:   option transport-type tcp
            22:   option auth.addr.brick.allow 127.0.0.1,192.168.1.*
            23:   subvolumes brick
            24: end-volume

+------------------------------------------------------------------------------+
            [2009-05-27 16:38:20] N [glusterfsd.c:1152:main]
            glusterfs: Successfully started
            [2009-05-27 16:38:33] N
            [server-protocol.c:7035:mop_setvolume] server: accepted
            client from 192.168.1.233:1021
            [2009-05-27 16:38:33] N
            [server-protocol.c:7035:mop_setvolume] server: accepted
            client from 192.168.1.233:1020
            [2009-05-27 16:38:46] N
            [server-protocol.c:7035:mop_setvolume] server: accepted
            client from 192.168.1.252:1021
            [2009-05-27 16:38:46] N
            [server-protocol.c:7035:mop_setvolume] server: accepted
            client from 192.168.1.252:1020

================================================================================
            Version      : glusterfs 2.0.1 built on May 27 2009 16:04:01
            TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b
            Starting Time: 2009-05-27 16:38:46
            Command line : /usr/sbin/glusterfs -N -f
            /etc/glusterfs/glusterfs-client.vol /mnt/gluster
            PID          : 32161
            System name  : Linux
            Nodename     : weeber.st-inst.lv
            Kernel Release : 2.6.28-hardened-r7
            Hardware Identifier: i686

            Given volfile:
+------------------------------------------------------------------------------+
            1: volume xeon
            2:   type protocol/client
            3:   option transport-type tcp
            4:   option remote-host 192.168.1.233
            5:   option remote-subvolume brick
            6: end-volume
            7:
            8: volume weeber
            9:   type protocol/client
            10:   option transport-type tcp
            11:   option remote-host 192.168.1.252
            12:   option remote-subvolume brick
            13: end-volume
            14:
            15: volume replicate
            16:  type cluster/replicate
            17:  subvolumes xeon weeber
            18: end-volume
            20: volume readahead
            21:   type performance/read-ahead
            22:   option page-size 128kB
            23:   option page-count 16
            24:   option force-atime-update off
            25:   subvolumes replicate
            26: end-volume
            27:
            28: volume writebehind
            29:   type performance/write-behind
            30:   option aggregate-size 1MB
            31:   option window-size 3MB
            32:   option flush-behind on
            33:   option enable-O_SYNC on
            34:   subvolumes readahead
            35: end-volume
            36:
            37: volume iothreads
            38:   type performance/io-threads
            39:   option autoscaling on
            40:   subvolumes writebehind
            41: end-volume
            42:
            43:
            44:
            45: #volume bricks
            46: #type cluster/distribute
            47:  #option lookup-unhashed yes
            48:  #option min-free-disk 20%
            49: # subvolumes weeber xeon
            50: #end-volume

+------------------------------------------------------------------------------+
            [2009-05-27 16:38:46] W
            [xlator.c:555:validate_xlator_volume_options]
            writebehind: option 'window-size' is deprecated,
            preferred is 'cache-size', continuing with correction
            [2009-05-27 16:38:46] W
            [glusterfsd.c:455:_log_if_option_is_invalid] writebehind:
            option 'aggregate-size' is not recognized
            [2009-05-27 16:38:46] W
            [glusterfsd.c:455:_log_if_option_is_invalid] readahead:
            option 'page-size' is not recognized
            [2009-05-27 16:38:46] N [glusterfsd.c:1152:main]
            glusterfs: Successfully started
            [2009-05-27 16:38:46] N
            [client-protocol.c:5557:client_setvolume_cbk] xeon:
            Connected to 192.168.1.233:6996, attached to remote
            volume 'brick'.
            [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate:
            Subvolume 'xeon' came back up; going online.
            [2009-05-27 16:38:46] N
            [client-protocol.c:5557:client_setvolume_cbk] xeon:
            Connected to 192.168.1.233:6996, attached to remote
            volume 'brick'.
            [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate:
            Subvolume 'xeon' came back up; going online.
            [2009-05-27 16:38:46] N
            [client-protocol.c:5557:client_setvolume_cbk] weeber:
            Connected to 192.168.1.252:6996, attached to remote
            volume 'brick'.
            [2009-05-27 18:46:02] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 18:16:01. frame-timeout = 1800
            [2009-05-27 19:16:09] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 18:46:02. frame-timeout = 1800
            [2009-05-27 19:46:18] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPEN(12) frame sent =
            2009-05-27 19:16:09. frame-timeout = 1800
            [2009-05-27 20:16:25] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 19:46:18. frame-timeout = 1800
            [2009-05-27 20:46:34] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 20:16:25. frame-timeout = 1800
            [2009-05-27 21:16:41] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPEN(12) frame sent =
            2009-05-27 20:46:34. frame-timeout = 1800
            [2009-05-27 21:47:00] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 21:16:53. frame-timeout = 1800
            [2009-05-27 22:17:07] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 21:47:00. frame-timeout = 1800
            [2009-05-27 22:47:15] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPENDIR(21) frame sent =
            2009-05-27 22:17:07. frame-timeout = 1800
            [2009-05-27 23:17:23] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 22:47:15. frame-timeout = 1800
            [2009-05-27 23:47:31] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPEN(12) frame sent =
            2009-05-27 23:17:23. frame-timeout = 1800
            [2009-05-28 00:17:39] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-27 23:47:32. frame-timeout = 1800
            [2009-05-28 00:47:47] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 00:17:39. frame-timeout = 1800
            [2009-05-28 01:17:55] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPENDIR(21) frame sent =
            2009-05-28 00:47:47. frame-timeout = 1800
            [2009-05-28 01:48:03] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 01:17:55. frame-timeout = 1800
            [2009-05-28 02:18:11] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPEN(12) frame sent =
            2009-05-28 01:48:03. frame-timeout = 1800
            [2009-05-28 02:48:29] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 02:18:24. frame-timeout = 1800
            [2009-05-28 03:18:37] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 02:48:29. frame-timeout = 1800
            [2009-05-28 03:48:45] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 03:18:37. frame-timeout = 1800
            [2009-05-28 04:18:53] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame XATTROP(40) frame sent =
            2009-05-28 03:48:45. frame-timeout = 1800
            [2009-05-28 04:49:01] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 04:18:53. frame-timeout = 1800
            [2009-05-28 05:19:09] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame OPENDIR(21) frame sent =
            2009-05-28 04:49:01. frame-timeout = 1800
            [2009-05-28 05:49:17] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 05:19:09. frame-timeout = 1800
            [2009-05-28 06:19:25] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 05:49:17. frame-timeout = 1800
            [2009-05-28 06:49:33] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame XATTROP(40) frame sent =
            2009-05-28 06:19:25. frame-timeout = 1800
            [2009-05-28 07:19:40] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 06:49:33. frame-timeout = 1800
            [2009-05-28 07:49:48] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 07:19:40. frame-timeout = 1800
            [2009-05-28 08:19:56] E [client-protocol.c:292:call_bail]
            weeber: bailing out frame LOOKUP(32) frame sent =
            2009-05-28 07:49:48. frame-timeout = 1800


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Reply via email to