Terry L. Inzauro wrote:
> Alan Robertson wrote:
>> Terry L. Inzauro wrote:
>>> Andrew Beekhof wrote:
>>>> On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote:
>>>>> list,
>>>>>
>>>>> this is a continuation of another thread that was started a few weeks
>>>>> back.  the original thread was
>>>>> started in regards
>>>>> to the setup of pingd. this thread is in regards to pingd not being
>>>>> able to start for whatever
>>>>> reason and i suspect my resource
>>>>> groups are not starting as a result ;(
>>>>>
>>>>> a little background:
>>>>>
>>>>> - two resource groups are defined. i want to split the two resource
>>>>> groups between nodes when both
>>>>> nodes are online. if both
>>>>> nodes are not online, then obviously, fail the resource resource group
>>>>> to the other available node.
>>>>> - pingd configuration was previously verified correct by Alan R.
>>>>> - crm_verify passes
>>>>> - BasicSanityCheck 'does not pass' (fails on pingd checks)
>>>> pingd isn't failing...
>>>>
>>>> Apr 11 12:44:07 roxetta CTS: BadNews: heartbeat[13770]:
>>>> 2007/04/11_12:44:05 ERROR: glib: Error sending packet: Operation not
>>>> permitted
>>>> Apr 11 12:44:07 roxetta CTS: BadNews: heartbeat[13770]:
>>>> 2007/04/11_12:44:05 ERROR: write failure on ping 127.0.0.1.: Operation
>>>> not permitted
>>>>
>>>> these messages are from the heartbeat communications layer - and if
>>>> thats not working, then pingd has no hope at all.
>>>>
>>>> i have no idea why pinging localhost should fail - firewall?
>>>>
>>>>> - without pingd, the resource groups function as expected
>>>>> - heartbeat has been restarted
>>>>> - heartbeat hangs on stopping so i do the following ;)
>>>>>         for i in `ps -ef | grep heart  | awk '{print $2}'`; do kill
>>>>> $i; done
>>>>>
>>>>> i have noticed log entries in the log file that are obviously related
>>>>> to pingd.  this however 'may'
>>>>> not be the case.
>>>>> would anyone be interested in lending a hand?
>>>>>
>>>>> heartbeat version = 2.0.8-r2
>>>>> OS = gentoo 2006.1
>>>>> kernel = 2.6.18 (i have tested both hardened<with grsecurity and pax>
>>>>> as well as generic)
>>>>>
>>>>>
>>>>>
>>>>> cibadmin -Q output , ptest output, BasicSanityCheck output and
>>>>> messages file are all attached as a
>>>>> .tar.bz2.
>>>>>
>>>>>
>>>>> believe me when i tell you that i am stumped. any assistance is
>>>>> greatly appreciated.

>>>>>
>>> no firewall. i tested with and without iptables. in fact i even unloaded 
>>> ALL iptables modules just
>>> to be certain.  so then i thought to myself.  pax? perhaps grsecurity? no 
>>> luck there either.  i
>>> rebuild a kernel without all of the grsec and pax hooks.  no luck.
>>>
>>>
>>>
>>> destiny crm # lsmod
>>> Module                  Size  Used by
>>> softdog                 4752  0
>>> tun                     9184  0
>>> e100                   28360  0
>>> sym53c8xx              64820  0
>>> eepro100               25552  0
>>> scsi_transport_spi     18752  1 sym53c8xx
>>>
>>> destiny crm # ping 127.0.0.1
>>> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.097 ms
>>> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.054 ms
>>>
>>> --- 127.0.0.1 ping statistics ---
>>> 2 packets transmitted, 2 received, 0% packet loss, time 1002ms
>>> rtt min/avg/max/mdev = 0.054/0.075/0.097/0.023 ms
>>>
>>>
>>> so i re-ran BasicSAanityCheck....same result.   any ideas?
>> Here is something to run and check...
>>
>> ifconfig lo;ip addr show lo; route;ip route show
>>
>> Here's what it produces on my machine:
>> lo        Link encap:Local Loopback
>>           inet addr:127.0.0.1  Mask:255.0.0.0
>>           inet6 addr: ::1/128 Scope:Host
>>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>           RX packets:520006 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:520006 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:0
>>           RX bytes:190990507 (182.1 Mb)  TX bytes:190990507 (182.1 Mb)
>>
>> 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>     inet6 ::1/128 scope host
>>        valid_lft forever preferred_lft forever
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
>> Iface
>> 10.10.10.0      *               255.255.255.0   U     0      0        0 eth1
>> link-local      *               255.255.0.0     U     0      0        0 eth1
>> loopback        *               255.0.0.0       U     0      0        0 lo
>> default         gw              0.0.0.0         UG    0      0        0 eth1
>> 10.10.10.0/24 dev eth1  proto kernel  scope link  src 10.10.10.5
>> 169.254.0.0/16 dev eth1  scope link
>> 127.0.0.0/8 dev lo  scope link
>> default via 10.10.10.254 dev eth1
>>
>>
>> I don't know what I'm looking for to be different, but it's at least
>> somewhere to start...
>>
>>
> 
> i know. there's not much to go on here....and i'm not a coder, but perhaps 
> someone can look in glib
> as to why/when that error function would actually gets called?
> 
> 
> 
> destiny crm # ip addr show dev lo
> 2: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
> 
> 
> destiny crm # ip route show
> 10.254.254.0/30 dev eth1  proto kernel  scope link  src 10.254.254.2
> x.x.x.x/28 dev eth0  proto kernel  scope link  src 63.147.188.174
> 10.0.0.0/24 dev eth2  proto kernel  scope link  src 10.0.0.2
> 127.0.0.0/8 dev lo  scope link
> default via x.x.x.x dev eth0
> 
> 
> destiny crm # ifconfig lo
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:37780 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:37780 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:5114592 (4.8 Mb)  TX bytes:5114592 (4.8 Mb)

I wrote the code that's printing that error message.  I know why it's
coming out, it's because the attempt to send the packet to the loopback
interface failed.

2007/04/11_12:44:05 ERROR: glib: Error sending packet: Operation not
>>> permitted
>>>> 2007/04/11_12:44:05 ERROR: write failure on ping 127.0.0.1.:
Operation not permitted

These come from this statement in the code failing:
        if ((rc=sendto(ei->sock, (void *) icmp_pkt, pktsize, 0
        ,       (struct sockaddr *)&ei->addr
        ,       sizeof(struct sockaddr))) != (ssize_t)pktsize) {
                PILCallLog(LOG, PIL_CRIT, "Error sending packet: %s",
strerror(errno));
                FREE(icmp_pkt);
                ha_msg_del(msg);
                return(HA_FAIL);
        }

We're trying to write to a raw socket (SOCK_RAW).  This won't work
unless you're root.  But, I think it should fail on the socket() call,
not on the sendto().


But, I believe we're getting EPERM.
#define      EPERM            1      /* Operation not permitted */

I can't find any documented reasons for sendto(2) to return EPERM.

But, I did see a case on the mailing list where a sendto(2) on UDP
returned EPERM because of a firewall...

But, it would be good anyway to check the effective user id that
heartbeat's write processes are running under to make sure they're toot.

I don't know how/why they wouldn't be, but it seems good to check anyway.



-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to