Thanks Franks, Works just great!
Greetings! On 12/23/11 13:18, Frank Heckes wrote: > Hi, > > we had had the same problem. We 'fixed' it by increasing the start > parameter in Linux-HA > script /usr/lib/ocf/resource.d/heartbeat/Filesystem > > ... > <action name="start" timeout="300" /> > ... > > If you use pacemaker or RH cluster suite (although your config dir looks > like linux-ha) there's probably a similar parameter. > > Cheers > > -Frank > > On Thu, 2011-12-22 at 16:38 +0100, Patrice Hamelin wrote: >> Hi, >> >> I have a heartbeat problem while trying automatic failover. Manual >> failover works great, unmounting a partitition from an OSS and >> remounting it on another one makes the clients recover. It all starts >> with this error: >> >> Filesystem[7650]: 2011/12/22_14:36:05 ERROR: Couldn't mount >> filesystem /dev/mpath/colosse4-lun60-sata on /mnt/data/clun60 >> Filesystem[7639]: 2011/12/22_14:36:05 ERROR: Generic error >> >> As a result, the failover OSS is the wrong one and the clients stays >> in this state forever: >> >> sata-OST0000_UUID : Resource temporarily unavailable >> >> Here is my heartbeat config: >> >> [root@ib3-st02 ~]# cat /etc/ha.d/ha.cf >> # log file settings >> # write debug output to /var/log/ha-debug >> debugfile /var/log/ha-debug >> # write log messages to /var/log/ha-log >> logfile /var/log/ha-log >> # use syslog to write to logfiles >> logfacility local0 >> # set some time-outs. these values are only recommendations, which >> # depend e.g. on the OSS load >> # send keep-alive packages every 2 seconds >> keepalive 2 >> # wait 90 seconds before declaring a node dead >> deadtime 90 >> # write a warning to the logfile after 30 seconds without an answer >> # from the failover node >> warntime 30 >> # wait for 120 seconds before declaring a node dead after heartbeat >> # is brought up >> initdead 120 >> # define communication channels >> # use port 12345 to communicate with fail-over node >> udpport 12345 >> # use network interfaces eth0 and ib0 to detect a failed node >> bcast eth0 bond0 >> # Use manual failback >> auto_failback off >> # node names in this failover-pair. These names must match the >> # output of `hostname` >> node ib3-st01 >> node ib3-st02 >> node ib3-st03 >> node ib3-st04 >> >> [root@ib3-st02 ~]# cat /etc/ha.d/haresources >> ib3-st01 Filesystem::/dev/emcssd-1/mdt-sata::/mnt/mdt-colosse::lustre >> ib3-st01 >> Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre >> ib3-st02 >> Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre >> ib3-st03 >> Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre >> ib3-st04 >> Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre >> ib3-st01 >> Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre >> ib3-st02 >> Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre >> ib3-st03 >> Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre >> ib3-st04 >> Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre >> >> >> It is all the same on all OSS's. >> >> Does anybody ever encounter that problem? >> Thanks for help. >> >> >> >> > > > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss -- Patrice Hamelin Specialiste sénior en systèmes d'exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Téléphone | Telephone 514-421-5303 Télécopieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
