Re: [etherlab-users] r8169 patch - packet timeout boot failures

Raz Tue, 03 Dec 2013 04:01:47 -0800

driver never fails to load because i am getting timeout errors.  I did not
crash/panic, but some sort of a system lockup ( various user space net
daemons stopped the boot process, probably because the interface bringup
was in some error state, i think is some netlink socket hanging ). This is
when i added the spin locks which appear to stop the hang.
I think that once i patch the e1000e , we might have some more knowledge of
why this is happening. Please note that this problem happens in my various
intel pc boards and is not bounded to a single type of board.




On Tue, Dec 3, 2013 at 1:50 PM, Jeroen Van den Keybus <
[email protected]> wrote:

> Just a thought since you mention booting: is it possible your driver is
> sometimes simply loaded before the master is (and fails to register) ? You
> mention that you crashed upon boot without the spinlocks and the only way
> to do that should be that you run as a regular netdev device (line 4170)
> incl. irqs. Could also explain why the e1000 has the problem.
>
> I suspect that adding the link status check merely causes an extra delay
> which could lead to the master being loaded earlier.
>
> J.
>
>
> 2013/12/3 Raz <[email protected]>
>
>> All i am doing is more of a trial and error. I do not know the realtek
>> driver at all.
>> The spinlock are needed because they are protected in the original driver
>> code flow . i had a boot lockup in one of my trials without them.  This
>> patch does not eliminate the problem entirely, but from 10 trials with 6
>> drives with a 100% failures to 1 out of 10 I believe it important enough to
>> mail to the community. as for e1000e i do not know what the problem is, i
>> need to check it and email you.
>>
>>
>>
>> On Tue, Dec 3, 2013 at 1:16 PM, Jeroen Van den Keybus <
>> [email protected]> wrote:
>>
>>> Why the spinlock ? This driver instance shouldn't ever be reentering.
>>>
>>> I'm a bit worried that it would complicate the use of e.g. RTAI and
>>> Xenomai.
>>>
>>> How comes the e1000 has the same issue ?
>>>
>>> J.
>>>
>>>
>>>
>>> 2013/12/3 Raz <[email protected]>
>>>
>>>> The bellow patch seemed to eliminate the problem. I believe the problem
>>>> relates to resetting some registers when link up is detected.
>>>>
>>>> diff --git a/local_src/r8169-3.2/r8169.c b/local_src/r8169-3.2/r8169.c
>>>> index 6df1793..a483fb5 100644
>>>> --- a/local_src/r8169-3.2/r8169.c
>>>> +++ b/local_src/r8169-3.2/r8169.c
>>>> @@ -1290,6 +1290,9 @@ static void __rtl8169_check_link_status(struct
>>>> net_device *dev,
>>>>
>>>>         if (tp->ecdev) {
>>>>                 ecdev_set_link(tp->ecdev, tp->link_ok(ioaddr) ? 1 : 0);
>>>> +               spin_lock_irqsave(&tp->lock, flags);
>>>> +               rtl_link_chg_patch(tp);
>>>> +               spin_unlock_irqrestore(&tp->lock, flags);
>>>>                 return;
>>>>         }
>>>>
>>>>
>>>>
>>>> On Tue, Dec 3, 2013 at 11:56 AM, Jeroen Van den Keybus <
>>>> [email protected]> wrote:
>>>>
>>>>> Perhaps try hooking up a normal eth interface to the drive and see
>>>>> what the autoneg comes up with using ethtool. In the past, I have had
>>>>> trouble interfacing an FPGA IP core to a PC Ethernet card when the core 
>>>>> was
>>>>> hard wired to 100M FD instead of advertising this using autoneg. The PC
>>>>> card tried to autoneg and then fell back to 100M HD.
>>>>>
>>>>> You could try testing with an EK1100 in between the PC and the drive.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> 2013/12/3 Raz <[email protected]>
>>>>>
>>>>>> I do not have ethtool over the ethercat device as it is removed. How
>>>>>> can I tell ? eth0 is 100Mbps but it is my public interface. eth1 is my
>>>>>> ethercat interface.
>>>>>>
>>>>>> There is always a link.  the first slave is a drive, not an io device
>>>>>> . This drive is running xilinix with port stack and ip core of beckhof.
>>>>>> I am trying to debug now the realtek driver, let see...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 3, 2013 at 11:36 AM, Jeroen Van den Keybus <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> It would be very useful to know whether e.g. the interfaces ended up
>>>>>>> in 100M half duplex or so. Is there a link in those cases ? What's the
>>>>>>> first EtherCAT station ? Maybe it doesn't handle autoneg properly during
>>>>>>> its reset phase ?
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/3 Raz <[email protected]>
>>>>>>>
>>>>>>>> hey
>>>>>>>> Problem happens with intel e1000e as well as realtek.  One way to
>>>>>>>> bypass it is to boot the master while the ethernet-ethercat cable is
>>>>>>>> disconnected, and once master claims the interface , connect this 
>>>>>>>> cable.
>>>>>>>> This appears to work.
>>>>>>>> So , There some sort of of initialisation error.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 11:32 AM, Raz <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> I still do not have a scenario. it "sometimes" happens. The
>>>>>>>>> -DRTL8169_DEBUG is something i did not know, so i will check and see. 
>>>>>>>>> thx
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 2, 2013 at 11:27 AM, Jeroen Van den Keybus <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Is there a difference between cold and warm boot ? Does unloading
>>>>>>>>>> the ec driver, loading/unloading the stock r8169 driver and then 
>>>>>>>>>> reloading
>>>>>>>>>> the ec driver work better ? Same scenario but with Realtek drivers 
>>>>>>>>>> (r8168)
>>>>>>>>>> ? Also perhaps compile with -DRTL8169_DEBUG ?
>>>>>>>>>>
>>>>>>>>>> Just some thoughts.
>>>>>>>>>>
>>>>>>>>>> J.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/12/2 Raz <[email protected]>
>>>>>>>>>>
>>>>>>>>>>> The timeouts happens after the system boots and not while slaves
>>>>>>>>>>> are in in OP mode. So my transmit is irrelevant here, even though a
>>>>>>>>>>> transmit happens only from a single thread of through an ioctl ( 
>>>>>>>>>>> SDO reads
>>>>>>>>>>> and so on..)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 2, 2013 at 11:01 AM, Jeroen Van den Keybus <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 1. why do you disable the rtl8169_phy_timer  timer ?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The rtl8169_phy_timer is regularly polled in ec_poll instead.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 2.  In rtl_hw_start_8168 : why do disable RTL_W16(IntrMask,
>>>>>>>>>>>>> tp->intr_event); ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> The drivers are all non-blocking and interrupt-free. All work
>>>>>>>>>>>> that interrupt handlers normally do is done in ec_poll instead.
>>>>>>>>>>>>
>>>>>>>>>>>> If you cannot send packets anymore, I suspect that you may have
>>>>>>>>>>>> overrun the tx queue, i.e. sent a packet before the previous one 
>>>>>>>>>>>> has been
>>>>>>>>>>>> completed. You're also not calling the ethercat transmission 
>>>>>>>>>>>> functions from
>>>>>>>>>>>> different threads, right ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> thank you
>>>>>>>>>>>>> raz
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> etherlab-users mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> http://lists.etherlab.org/mailman/listinfo/etherlab-users
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> https://sites.google.com/site/ironspeedlinux/
>>>>
>>>
>>>
>>
>>
>> --
>> https://sites.google.com/site/ironspeedlinux/
>>
>
>


-- 
https://sites.google.com/site/ironspeedlinux/

_______________________________________________
etherlab-users mailing list
[email protected]
http://lists.etherlab.org/mailman/listinfo/etherlab-users

Re: [etherlab-users] r8169 patch - packet timeout boot failures

Reply via email to