Terry L. Inzauro wrote:
> Alan Robertson wrote:
>> Terry L. Inzauro wrote:
>>> Andrew Beekhof wrote:
>>>> On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote:
>>>>> list,
>>>>>
>>>>> this is a continuation of another thread that was started a few weeks
>>>>> back. the original thread was
>>>>> started in regards
>>>>> to the setup of pingd. this thread is in regards to pingd not being
>>>>> able to start for whatever
>>>>> reason and i suspect my resource
>>>>> groups are not starting as a result ;(
>>>>>
>>>>> a little background:
>>>>>
>>>>> - two resource groups are defined. i want to split the two resource
>>>>> groups between nodes when both
>>>>> nodes are online. if both
>>>>> nodes are not online, then obviously, fail the resource resource group
>>>>> to the other available node.
>>>>> - pingd configuration was previously verified correct by Alan R.
>>>>> - crm_verify passes
>>>>> - BasicSanityCheck 'does not pass' (fails on pingd checks)
>>>> pingd isn't failing...
>>>>
>>>> Apr 11 12:44:07 roxetta CTS: BadNews: heartbeat[13770]:
>>>> 2007/04/11_12:44:05 ERROR: glib: Error sending packet: Operation not
>>>> permitted
>>>> Apr 11 12:44:07 roxetta CTS: BadNews: heartbeat[13770]:
>>>> 2007/04/11_12:44:05 ERROR: write failure on ping 127.0.0.1.: Operation
>>>> not permitted
>>>>
>>>> these messages are from the heartbeat communications layer - and if
>>>> thats not working, then pingd has no hope at all.
>>>>
>>>> i have no idea why pinging localhost should fail - firewall?
>>>>
>>>>> - without pingd, the resource groups function as expected
>>>>> - heartbeat has been restarted
>>>>> - heartbeat hangs on stopping so i do the following ;)
>>>>> for i in `ps -ef | grep heart | awk '{print $2}'`; do kill
>>>>> $i; done
>>>>>
>>>>> i have noticed log entries in the log file that are obviously related
>>>>> to pingd. this however 'may'
>>>>> not be the case.
>>>>> would anyone be interested in lending a hand?
>>>>>
>>>>> heartbeat version = 2.0.8-r2
>>>>> OS = gentoo 2006.1
>>>>> kernel = 2.6.18 (i have tested both hardened<with grsecurity and pax>
>>>>> as well as generic)
>>>>>
>>>>>
>>>>>
>>>>> cibadmin -Q output , ptest output, BasicSanityCheck output and
>>>>> messages file are all attached as a
>>>>> .tar.bz2.
>>>>>
>>>>>
>>>>> believe me when i tell you that i am stumped. any assistance is
>>>>> greatly appreciated.
>>>>>
>>> no firewall. i tested with and without iptables. in fact i even unloaded
>>> ALL iptables modules just
>>> to be certain. so then i thought to myself. pax? perhaps grsecurity? no
>>> luck there either. i
>>> rebuild a kernel without all of the grsec and pax hooks. no luck.
>>>
>>>
>>>
>>> destiny crm # lsmod
>>> Module Size Used by
>>> softdog 4752 0
>>> tun 9184 0
>>> e100 28360 0
>>> sym53c8xx 64820 0
>>> eepro100 25552 0
>>> scsi_transport_spi 18752 1 sym53c8xx
>>>
>>> destiny crm # ping 127.0.0.1
>>> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.097 ms
>>> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.054 ms
>>>
>>> --- 127.0.0.1 ping statistics ---
>>> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
>>> rtt min/avg/max/mdev = 0.054/0.075/0.097/0.023 ms
>>>
>>>
>>> so i re-ran BasicSAanityCheck....same result. any ideas?
>> Here is something to run and check...
>>
>> ifconfig lo;ip addr show lo; route;ip route show
>>
>> Here's what it produces on my machine:
>> lo Link encap:Local Loopback
>> inet addr:127.0.0.1 Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:520006 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:520006 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:190990507 (182.1 Mb) TX bytes:190990507 (182.1 Mb)
>>
>> 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> inet 127.0.0.1/8 scope host lo
>> inet6 ::1/128 scope host
>> valid_lft forever preferred_lft forever
>> Kernel IP routing table
>> Destination Gateway Genmask Flags Metric Ref Use
>> Iface
>> 10.10.10.0 * 255.255.255.0 U 0 0 0 eth1
>> link-local * 255.255.0.0 U 0 0 0 eth1
>> loopback * 255.0.0.0 U 0 0 0 lo
>> default gw 0.0.0.0 UG 0 0 0 eth1
>> 10.10.10.0/24 dev eth1 proto kernel scope link src 10.10.10.5
>> 169.254.0.0/16 dev eth1 scope link
>> 127.0.0.0/8 dev lo scope link
>> default via 10.10.10.254 dev eth1
>>
>>
>> I don't know what I'm looking for to be different, but it's at least
>> somewhere to start...
>>
>>
>
> i know. there's not much to go on here....and i'm not a coder, but perhaps
> someone can look in glib
> as to why/when that error function would actually gets called?
>
>
>
> destiny crm # ip addr show dev lo
> 2: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
>
>
> destiny crm # ip route show
> 10.254.254.0/30 dev eth1 proto kernel scope link src 10.254.254.2
> x.x.x.x/28 dev eth0 proto kernel scope link src 63.147.188.174
> 10.0.0.0/24 dev eth2 proto kernel scope link src 10.0.0.2
> 127.0.0.0/8 dev lo scope link
> default via x.x.x.x dev eth0
>
>
> destiny crm # ifconfig lo
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:37780 errors:0 dropped:0 overruns:0 frame:0
> TX packets:37780 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:5114592 (4.8 Mb) TX bytes:5114592 (4.8 Mb)
I wrote the code that's printing that error message. I know why it's
coming out, it's because the attempt to send the packet to the loopback
interface failed.
2007/04/11_12:44:05 ERROR: glib: Error sending packet: Operation not
>>> permitted
>>>> 2007/04/11_12:44:05 ERROR: write failure on ping 127.0.0.1.:
Operation not permitted
These come from this statement in the code failing:
if ((rc=sendto(ei->sock, (void *) icmp_pkt, pktsize, 0
, (struct sockaddr *)&ei->addr
, sizeof(struct sockaddr))) != (ssize_t)pktsize) {
PILCallLog(LOG, PIL_CRIT, "Error sending packet: %s",
strerror(errno));
FREE(icmp_pkt);
ha_msg_del(msg);
return(HA_FAIL);
}
We're trying to write to a raw socket (SOCK_RAW). This won't work
unless you're root. But, I think it should fail on the socket() call,
not on the sendto().
But, I believe we're getting EPERM.
#define EPERM 1 /* Operation not permitted */
I can't find any documented reasons for sendto(2) to return EPERM.
But, I did see a case on the mailing list where a sendto(2) on UDP
returned EPERM because of a firewall...
But, it would be good anyway to check the effective user id that
heartbeat's write processes are running under to make sure they're toot.
I don't know how/why they wouldn't be, but it seems good to check anyway.
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems