Have you tried putting the ctdb files onto a separate gpfs filesystem? Vic Cornell [email protected]
On 12 Apr 2013, at 16:43, Orlando Richards <[email protected]> wrote: > On 12/04/13 15:43, Bob Cregan wrote: >> Hi Orlando, >> We use ctdb/samba for CIFS, and CNFS for NFS >> (GPFS version 3.4.0-13) . Current versions are >> >> ctdb - 1.0.99 >> samba 3.5.15 >> >> Both compiled from source. We have about 300+ users normally. >> > > We have suspicions that 3.6 has put additional "chatter" into the ctdb > database stream, which has pushed us over the edge. Barry Evans has found > that the clustered locking databases, in particular, prove to be a > scalability/usability limit for ctdb. > > >> We have had no issues with this setup apart from CNFS which had 2 or 3 >> bad moments over the last year . These have gone away since we have >> fixed a bug with our 10G NIC drivers (emulex cards , kernel module >> be2net) which lead to occasional dropped packets for jumbo frames. There >> have been no issues with samba/ctdb >> >> The only comment I can make is that during initial investigations into >> an upgrade of samba to 3.6.x we discovered that the 3.6 code would not >> compile against ctdb 1.0.99 (compilation requires tthe ctdb source ) >> with error messages like: >> >> configure: checking whether cluster support is available >> checking for ctdb.h... yes >> checking for ctdb_private.h... yes >> checking for CTDB_CONTROL_TRANS3_COMMIT declaration... yes >> checking for CTDB_CONTROL_SCHEDULE_FOR_DELETION declaration... no >> configure: error: "cluster support not available: support for >> SCHEDULE_FOR_DELETION control missing" >> >> >> What occurs to me is that this message seems to indicate that it is >> possible to run a ctdb version that is incompatible with samba 3.6. >> That would imply that an upgrade to a higher version of ctdb might >> help, of course it might not and make backing out harder. > > Certainly 1.0.114 builds fine - I've not tried 2.0, I'm too scared! The > versioning in CTDB has proved hard for me to fathom... > >> >> A compile against ctdb 2.0 works fine. We will soon be running in this >> upgrade, but I'm waiting to see what the samba people say at the UG >> meeting first! >> > > It has to be said - the timing is good! > Cheers, > Orlando > >> >> Thanks >> >> Bob >> >> >> On 12 April 2013 13:37, Orlando Richards <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi folks, ac <mailto:[email protected]> >> >> We've long been using CTDB and Samba for our NAS service, servicing >> ~500 users. We've been suffering from some problems with the CTDB >> performance over the last few weeks, likely triggered either by an >> upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result), >> or possibly by additional users coming on with a new workload. >> >> We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again, >> from sernet). Before we roll back, we'd like to make sure we can't >> fix the problem and stick with Samba 3.6 (and we don't even know >> that a roll back would fix the issue). >> >> The symptoms are a complete freeze of the service for CIFS users for >> 10-60 seconds, and on the servers a corresponding spawning of large >> numbers of CTDB processes, which seem to be created in a "big bang", >> and then do what they do and exit in the subsequent 10-60 seconds. >> >> We also serve up NFS from the same ctdb-managed frontends, and GPFS >> from the cluster - and these are both fine throughout. >> >> This was happening 5-10 times per hour, not at exact intervals >> though. When we added a third node to the CTDB cluster, it "got >> worse", and when we dropped the CTDB cluster down to a single node >> and everything started behaving fine - which is where we are now. >> >> So, I've got a bunch of questions! >> >> - does anyone know why ctdb would be spawning these processes, and >> if there's anything we can do to stop it needing to do it? >> - has anyone done any more general performance / config >> optimisation of CTDB? >> >> And - more generally - does anyone else actually use ctdb/samba/gpfs >> on the scale of ~500 users or higher? If so - how do you find it? >> >> >> -- >> -- >> Dr Orlando Richards >> Information Services >> IT Infrastructure Division >> Unix Section >> Tel: 0131 650 4994 >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> _________________________________________________ >> gpfsug-discuss mailing list >> [email protected] <mailto:[email protected]> >> http://gpfsug.org/mailman/__listinfo/gpfsug-discuss >> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss> >> >> >> >> >> -- >> >> Bob Cregan >> >> Senior Storage Systems Administrator >> >> ACRC >> >> Bristol University >> >> Tel: +44 (0) 117 331 4406 >> >> skype: bobcregan >> >> Mobile: +44 (0) 7712388129 >> > > > -- > -- > Dr Orlando Richards > Information Services > IT Infrastructure Division > Unix Section > Tel: 0131 650 4994 > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > _______________________________________________ > gpfsug-discuss mailing list > [email protected] > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list [email protected] http://gpfsug.org/mailman/listinfo/gpfsug-discuss
