connman Digest, Vol 46, Issue 10

connman-request Wed, 14 Aug 2019 12:03:03 -0700

Send connman mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.01.org/mailman/listinfo/connman
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of connman digest..."

Today's Topics:

   1. Sending DHCP release on incorrect iface when iface goes down
      (Joakim Lotseng?rd)
   2. Re: Unsalble cellular connection (JH)
   3. Re: Unsalble cellular connection (Jonas Bonn)

----------------------------------------------------------------------

Message: 1
Date: Wed, 14 Aug 2019 07:17:49 +0000
From: Joakim Lotseng?rd <[email protected]>
To: "[email protected]" <[email protected]>
Subject: Sending DHCP release on incorrect iface when iface goes down
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"

Good morning!

We have seen problems of a DHCP RELEASE packet being sent on incorrect 
interface (eth0) when another interface (mlan0) goes down. I have sadly 
not had time yet to completely debug this. I apologize in advance for 
not having time to test my theory properly. I would like to ask if 
someone else has observed this problem to aid my debugging.

We have an embedded Linux device that has Ethernet (wired), WiFi and 
cellular uplink connections. (It also has wifi downlink/AP interfaces 
that are not controlled by connman.) This error can be reproduced by 
having just ethernet (eth0 in our case) and WiFi (mlan0 in our case) 
connected. It works with the cellular interface as well, instead of WiFi.

Setup a running tcpdump on ethernet eth0 for DHCP packets. When down the 
mlan0 (wifi) interface. You will then see the DHCP release packet for 
the IP on mlan0 (wifi) interface goes out on eth0. To make matters 
worse, they are sent with the srcIP of eth0 (correct) but the dstIP is 
the DHCP-server what was on mlan0 (wifi). The srcIP is probably correct 
due to the SNAT-rule that connman added in iptables -t nat. If that rule 
is removed, the incorrect srcIP of the old mlan0's IP is used.

My conclusion here is that connman kind of "correctly" detects the 
downed mlan0 interface and wants to cancel its DHCP lease. It creates 
the DHCP release packet correct, with correct srcIP (of the downed 
iface) and dstIP of the DHCP-server that leased us the IP. The bad part 
is that now the interface is down. Linux kernel will do its best to 
route out the packet. It takes the default path (eth0) and sends it out. 
In our case we do have ip_forward enabled due to internal NATing in the 
device.

A small reservation here that we do have other complex routing setup in 
our device. We have for example downlink wifi interfaces as well and are 
routing/NAT in a non straight forward setup. I have not had the time yet 
to disable all those things and re-test.

I will try to investigate this further but wanted to ask if someone else 
seen this issue. My guess is that a fix is to not send the DHCP release 
if the interface we want to release the IP for is down. It kind of makes 
sense. There must be very few cases where the DHCP release packet can be 
routed to the correct DHCP-server via another interface.

connman version used is latest 1.37. However, we tested older of our 
firmwares and (at least) 1.36 has the problem as well.

Some debugging output:
(I've hidden the true IP of eth0 due to privacy.)

Run and wait for output in one terminal: (Dump ethernet)
$ tcpdump -envvvs 0 -i eth0 "udp and (port 68 or port 67)"

In another terminal run: (Take down a connected wifi interface)
$ ip link set dev mlan0 down

The first terminal gives us this:
10:32:14.802697 00:23:c1:1a:54:11 > 58:97:bd:24:bb:48, ethertype IPv4 
(0x0800), length 590: (tos 0x0, ttl 64, id 34388, offset 0, flags [DF], 
proto UDP (17), length 576)
     172.X.Y.12.68 > 192.168.1.1.67: [bad udp cksum 0xe011 -> 0x3d80!] 
BOOTP/DHCP, Request from 24:c3:f9:00:08:89, length 548, xid 0x6eecdf95, 
Flags [none] (0x0000)
          Client-IP 192.168.1.100
          Client-Ethernet-Address 24:c3:f9:00:08:89
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Release
            Server-ID Option 54, length 4: 192.168.1.1
            END Option 255, length 0
            PAD Option 0, length 0, occurs 298

IP 172.X.Y.12 is the IP of eth0. 192.168.1.0/24 is (was) the network of 
mlan0.

If i run iptables -t nat -F (clear the nat tables) the dump is:
root@lev-26F7A6CY:/usr/sbin# tcpdump -envvvs 0 -i eth0 "udp and (port 68 
or port 67)"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 
262144 bytes
11:45:13.316968 00:23:c1:1a:54:11 > 58:97:bd:24:bb:48, ethertype IPv4 
(0x0800), length 590: (tos 0x0, ttl 64, id 31652, offset 0, flags [DF], 
proto UDP (17), length 576)
     192.168.1.100.68 > 192.168.1.1.67: [bad udp cksum 0x85f3 -> 
0xf9f3!] BOOTP/DHCP, Request from 24:c3:f9:00:08:89, length 548, xid 
0x36bfb56d, Flags [none] (0x0000)
          Client-IP 192.168.1.100
          Client-Ethernet-Address 24:c3:f9:00:08:89
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Release
            Server-ID Option 54, length 4: 192.168.1.1
            END Option 255, length 0
            PAD Option 0, length 0, occurs 298

IP-addresses of the device when both eth0 and mlan0 was up:
wwan0 and wwan1 are cellular, and not used in my test. uap0 is 
downlink/AP wifi interface, not handled by connman.

root@lev-26F7A6CY:~# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 qdisc mq 
state UP group default qlen 1000
     link/ether 00:23:c1:1a:54:11 brd ff:ff:ff:ff:ff:ff
     inet 172.X.Y.12/24 brd 172.X.Y.255 scope global eth0
        valid_lft forever preferred_lft forever
18: mlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
group default qlen 1000
     link/ether 24:c3:f9:00:08:89 brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.100/24 brd 192.168.1.255 scope global mlan0
        valid_lft forever preferred_lft forever
19: uap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
group default qlen 1000
     link/ether 24:c3:f9:00:08:8a brd ff:ff:ff:ff:ff:ff
     inet 192.168.5.1/24 scope global uap0
        valid_lft forever preferred_lft forever
20: uap1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group 
default qlen 1000
     link/ether 00:50:43:02:00:01 brd ff:ff:ff:ff:ff:ff
21: wwan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group 
default qlen 1000
     link/ether fa:96:11:12:13:14 brd ff:ff:ff:ff:ff:ff
22: wwan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group 
default qlen 1000
     link/ether fa:96:11:12:13:16 brd ff:ff:ff:ff:ff:ff

I can easily get the full debug output of connman when I take the mlan0 
interface down, if needed. However it was a loooong output to post in a 
mail. I might also need to scrub the log from private data (IPs).

Also, I have no idea why tcpdump thinks the UDP-packet has incorrect 
chksum. The error was discovered due to an ISP reacting to getting 
incorrect DHCP-releases from customers using our devices.

Best regards,
Joakim Lotseng?rd

------------------------------

Message: 2
Date: Wed, 14 Aug 2019 21:11:59 +1000
From: JH <[email protected]>
To: Jonas Bonn <[email protected]>
Cc: Giacinto Cifelli <[email protected]>, "[email protected]"
        <[email protected]>, connman <[email protected]>
Subject: Re: Unsalble cellular connection
Message-ID:
        <CAA=hcWTH2Ayb=OtWim7h_AX+0haoq_GZ5ZYGHbOLkCMq+zr6=a...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi Jonas,

On 8/14/19, Jonas Bonn <[email protected]> wrote:
> So you have no traffic over the LTE link?  How do you know that it goes
> down after 1-2 hours... maybe it went down after 5 minutes?

I logged link status every 10 minutes.

> Cloud messages are as good as a ping... a packet's a packet.
> The link is configured by connman, not ofono.  ofono just publishes the
> address details so that connman can set it up.

So, it is back to connman?

>>> No Address?  No Interface?  Running ofono with debug output will
>>> probably shed some light on what's going on here.
>>
>> I think that could be because I only ran it once, if I ran
>> list-contexts 4 times consecutively, every time it displayed different
>> results included address and interface, is it the normal behaviour of
>> list-contexts?
>
> Sounds broken...???

Which one is broken, ofono or connman or something else? There is not
much configurations for both ofono and connman,

Can the defect hardware particularly the system power supply caused
the link down? I have only one test device, I cannot prove it. The
SATA chip has a pin for power up, During the boot, a command to set
the power up before ofono is running. When the link is down, either I
restart the ofonod or I called the command to turn the power up again,
both can bring the link back. I cannot image the chip could lose
power, but if the system power supply is not stable in some stage too
low to support the SARA chip, will it trig the power down? I talked to
the hardware engineer who rejected that it did not make sense because
WiFi was running fine, he is pointing the figure to the modem manager
and connection manager.

Thank you.

Kind regards,

- jh

------------------------------

Message: 3
Date: Wed, 14 Aug 2019 15:00:09 +0200
From: Jonas Bonn <[email protected]>
To: JH <[email protected]>
Cc: Giacinto Cifelli <[email protected]>, "[email protected]"
        <[email protected]>, connman <[email protected]>
Subject: Re: Unsalble cellular connection
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi JH,

On 14/08/2019 13:11, JH wrote:
> Hi Jonas,
> 
> On 8/14/19, Jonas Bonn <[email protected]> wrote:
>> So you have no traffic over the LTE link?  How do you know that it goes
>> down after 1-2 hours... maybe it went down after 5 minutes?
> 
> I logged link status every 10 minutes.

How?  Not clear from what you posted earliest how you can tell that the 
link has gone down.  The default route is set to go over WiFi so a 
generic ping isn't sufficient.  What exactly are you monitoring?

> 
>> Cloud messages are as good as a ping... a packet's a packet.
>> The link is configured by connman, not ofono.  ofono just publishes the
>> address details so that connman can set it up.
> 
> So, it is back to connman?

Well, the log _seemed_ to indicate that connman does an NTP request to a 
server that isn't available and thereby decides to take the link down... 
off the top of my head I can't say whether that's something connman 
actually does so hopefully somebody else will jump in here.  Did ofono 
indicate the context as still established after that?

> 
>>>> No Address?  No Interface?  Running ofono with debug output will
>>>> probably shed some light on what's going on here.
>>>
>>> I think that could be because I only ran it once, if I ran
>>> list-contexts 4 times consecutively, every time it displayed different
>>> results included address and interface, is it the normal behaviour of
>>> list-contexts?
>>
>> Sounds broken...???
> 
> Which one is broken, ofono or connman or something else? There is not
> much configurations for both ofono and connman,

I was referring to the DBus response containing incomplete Settings 
objects.  A random set of properties at each request is not the 
documented behaviour.

> 
> Can the defect hardware particularly the system power supply caused
> the link down? I have only one test device, I cannot prove it. The
> SATA chip has a pin for power up, During the boot, a command to set
> the power up before ofono is running. When the link is down, either I
> restart the ofonod or I called the command to turn the power up again,
> both can bring the link back. I cannot image the chip could lose
> power, but if the system power supply is not stable in some stage too
> low to support the SARA chip, will it trig the power down? I talked to
> the hardware engineer who rejected that it did not make sense because
> WiFi was running fine, he is pointing the figure to the modem manager
> and connection manager.

Seeing the issue with complete logs would be useful.  I thought your 
previous logs indicated connman trying to deactivate the context when it 
took the link down, but the ofono logs didn't show any of that... what's 
going on there?

/Jonas

> 
> Thank you.
> 
> Kind regards,
> 
> - jh
> 

------------------------------

Subject: Digest Footer

_______________________________________________
connman mailing list
[email protected]
https://lists.01.org/mailman/listinfo/connman

------------------------------

End of connman Digest, Vol 46, Issue 10
***************************************

connman Digest, Vol 46, Issue 10

Reply via email to