Re: [Linux-HA] how to check HBA with heartbeat

Cristina Bulfon Thu, 16 Apr 2009 23:59:24 -0700

Ciao.

it's happen when I shutdown heartbeat manually, the strange things

is that it happens only on the master node. But if you said it'snormally

it's fine for me.

Now I am going to configure the resource monitor and if I need somehelps

I will post an email with other subject.


Thanks a lot for helping me

cristina

On Apr 16, 2009, at 4:43 PM, Dejan Muhamedagic wrote:

Ciao Cristina,

On Thu, Apr 16, 2009 at 09:28:59AM +0200, Cristina Bulfon wrote:

Ciao,
I've solved also the problem with V2 style.. the problem was thatthe AFS
script was starting/stopping continuosly.
The solution was adding in the script the check of the daemon ,"status"
option , in this way the script
should be LSB complaint.


Good.

Instead still remain the problem with killing HBREAD etc ..
Has anybody have any clue ?


Looks like a normal shutdown to me. Or is it that heartbeat shuts
down by itself?

Thanks,

Dejan

Thanks

cristina


On Apr 15, 2009, at 4:06 PM, Cristina Bulfon wrote:
Ciao,

it seems to be solved:
it was my fault ,try to mount xfs filesystem instead ofext3 ..correct the
typo and did the filesystem check.
Everything seems working I don't get the error "umount .. busy"butwhen I tried to simulate the down of the active node and switch tothe
passive .. on the
ha-debug log file on the active node I got

ResourceManager[10006]: 2009/04/15_15:59:45 info: Running
/etc/ha.d/resource.d/Filesystem /dev/AFS/sda3 /vicepa/ xfs stop
Filesystem[10267]:      2009/04/15_15:59:45 INFO: Running stop for
/dev/AFS/sda3 on /vicepa
Filesystem[10267]:      2009/04/15_15:59:45 INFO: Trying to unmount
/vicepa
Filesystem[10267]:      2009/04/15_15:59:45 INFO: unmounted /vicepa
successfully
Filesystem[10256]:      2009/04/15_15:59:45 INFO:  Success
INFO:  Success
ResourceManager[10006]: 2009/04/15_15:59:45 info: Running
/etc/ha.d/resource.d/IPaddr 141.108.26.31/24/eth0 stop
In IP Stop
SIOCDELRT: No such process
IPaddr[10374]:  2009/04/15_15:59:45 INFO: ifconfig eth0:0 down
IPaddr[10345]:  2009/04/15_15:59:45 INFO:  Success
INFO:  Success
heartbeat[9993]: 2009/04/15_15:59:45 info: All HA resourcesrelinquished.
heartbeat[8025]: 2009/04/15_15:59:45 WARN: 1 lost packet(s) for
[afsitfs4.roma1.infn.it] [50:52]
heartbeat[8025]: 2009/04/15_15:59:45 info: No pkts missing from
afsitfs4.roma1.infn.it!
...
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBREAD process8030
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBWRITE process8031
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBREAD process8032
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBWRITE process8033
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBREAD process8034
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBFIFO process8028
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: killing HBWRITE process8029
with signal 15
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8029exited. 7
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8028exited. 6
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8031exited. 5
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8030exited. 4
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8032exited. 3
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8034exited. 2
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: Core process 8033exited. 1
remaining
heartbeat[8025]: 2009/04/15_15:59:47 info: afsitfs3.roma1.infn.it
Heartbeat shutdown complete.

Thanks

cristina


On Apr 15, 2009, at 3:16 PM, Dejan Muhamedagic wrote:
Ciao,

On Wed, Apr 15, 2009 at 01:37:40PM +0200, Cristina Bulfon wrote:
On Apr 15, 2009, at 1:18 PM, Dejan Muhamedagic wrote:
Ciao,

On Wed, Apr 15, 2009 at 12:53:41PM +0200, Cristina Bulfon wrote:
Ciao Dejan,

I am doing back & forth on this item :-)
I moved to 2.14. version and back to V1 style... I don't useanymore
DRBD,
just the mount
Do you need drbd?
No.. when I started the first time to use heartbeat I couldn'tmanage
the
filesystem mount with heartbeat
so I used DRDB as workaround, I don't need it since my devices are
visible
through the SAN.
OK. Make sure that you also configure fencing/stonith!
So the haresources file is the follows

afsitfs3.roma1.infn.it  IPaddr::141.108.26.31/24/eth0
afsitfs3.roma1.infn.it   Filesystem::/dev/AFS/sda3::/vicepa::xfs
afsitfs3.roma1.infn.it Filesystem::/dev/AFS/sda1::/usr/afs::ext3
afsitfs3.roma1.infn.it  141.108.26.31   afs

when I put the master node in stand_by or I stop the heartbeat,
happens
the
following things

- try the umount the filesystems before to stop "afs"..
Isn't it afs stop before filesystem?
That's is the problem I don't understand why .. it seems that
the stop is performed in the same  "start" order
That can't be. Really. Can't recall anymore how v1 works, perhaps
it looks at the status before deciding whether to stop a
resource.
umount: /vicepa: device is busy
umount: /vicepa: device is busy
Filesystem[3427]: 2009/04/14_09:16:52 ERROR: Couldn'tunmount
/vicepa; trying cleanup with SIGTERM
/vicepa:
This may be normal, i.e. there could be processes using the
filesystem, though typically there are only applications which
depend on the filesystem (in this case afs) which should be
doing something there. If this is a concern, you should check
which processes have files open over there (fuser,lsof).
With 2.1.3 version I didn;t see any kind of those message,everything
is
V1
style was fine.
I suspect that the afs RA is not working correctly, in particular
the status operation.
I will take a look

thanks cristina
Thanks,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] how to check HBA with heartbeat

Reply via email to