Just a thought since you mention booting: is it possible your driver is sometimes simply loaded before the master is (and fails to register) ? You mention that you crashed upon boot without the spinlocks and the only way to do that should be that you run as a regular netdev device (line 4170) incl. irqs. Could also explain why the e1000 has the problem.
I suspect that adding the link status check merely causes an extra delay which could lead to the master being loaded earlier. J. 2013/12/3 Raz <[email protected]> > All i am doing is more of a trial and error. I do not know the realtek > driver at all. > The spinlock are needed because they are protected in the original driver > code flow . i had a boot lockup in one of my trials without them. This > patch does not eliminate the problem entirely, but from 10 trials with 6 > drives with a 100% failures to 1 out of 10 I believe it important enough to > mail to the community. as for e1000e i do not know what the problem is, i > need to check it and email you. > > > > On Tue, Dec 3, 2013 at 1:16 PM, Jeroen Van den Keybus < > [email protected]> wrote: > >> Why the spinlock ? This driver instance shouldn't ever be reentering. >> >> I'm a bit worried that it would complicate the use of e.g. RTAI and >> Xenomai. >> >> How comes the e1000 has the same issue ? >> >> J. >> >> >> >> 2013/12/3 Raz <[email protected]> >> >>> The bellow patch seemed to eliminate the problem. I believe the problem >>> relates to resetting some registers when link up is detected. >>> >>> diff --git a/local_src/r8169-3.2/r8169.c b/local_src/r8169-3.2/r8169.c >>> index 6df1793..a483fb5 100644 >>> --- a/local_src/r8169-3.2/r8169.c >>> +++ b/local_src/r8169-3.2/r8169.c >>> @@ -1290,6 +1290,9 @@ static void __rtl8169_check_link_status(struct >>> net_device *dev, >>> >>> if (tp->ecdev) { >>> ecdev_set_link(tp->ecdev, tp->link_ok(ioaddr) ? 1 : 0); >>> + spin_lock_irqsave(&tp->lock, flags); >>> + rtl_link_chg_patch(tp); >>> + spin_unlock_irqrestore(&tp->lock, flags); >>> return; >>> } >>> >>> >>> >>> On Tue, Dec 3, 2013 at 11:56 AM, Jeroen Van den Keybus < >>> [email protected]> wrote: >>> >>>> Perhaps try hooking up a normal eth interface to the drive and see what >>>> the autoneg comes up with using ethtool. In the past, I have had trouble >>>> interfacing an FPGA IP core to a PC Ethernet card when the core was hard >>>> wired to 100M FD instead of advertising this using autoneg. The PC card >>>> tried to autoneg and then fell back to 100M HD. >>>> >>>> You could try testing with an EK1100 in between the PC and the drive. >>>> >>>> J. >>>> >>>> >>>> 2013/12/3 Raz <[email protected]> >>>> >>>>> I do not have ethtool over the ethercat device as it is removed. How >>>>> can I tell ? eth0 is 100Mbps but it is my public interface. eth1 is my >>>>> ethercat interface. >>>>> >>>>> There is always a link. the first slave is a drive, not an io device >>>>> . This drive is running xilinix with port stack and ip core of beckhof. >>>>> I am trying to debug now the realtek driver, let see... >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Dec 3, 2013 at 11:36 AM, Jeroen Van den Keybus < >>>>> [email protected]> wrote: >>>>> >>>>>> It would be very useful to know whether e.g. the interfaces ended up >>>>>> in 100M half duplex or so. Is there a link in those cases ? What's the >>>>>> first EtherCAT station ? Maybe it doesn't handle autoneg properly during >>>>>> its reset phase ? >>>>>> >>>>>> J. >>>>>> >>>>>> >>>>>> >>>>>> 2013/12/3 Raz <[email protected]> >>>>>> >>>>>>> hey >>>>>>> Problem happens with intel e1000e as well as realtek. One way to >>>>>>> bypass it is to boot the master while the ethernet-ethercat cable is >>>>>>> disconnected, and once master claims the interface , connect this cable. >>>>>>> This appears to work. >>>>>>> So , There some sort of of initialisation error. >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 2, 2013 at 11:32 AM, Raz <[email protected]> wrote: >>>>>>> >>>>>>>> I still do not have a scenario. it "sometimes" happens. The >>>>>>>> -DRTL8169_DEBUG is something i did not know, so i will check and see. >>>>>>>> thx >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Dec 2, 2013 at 11:27 AM, Jeroen Van den Keybus < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Is there a difference between cold and warm boot ? Does unloading >>>>>>>>> the ec driver, loading/unloading the stock r8169 driver and then >>>>>>>>> reloading >>>>>>>>> the ec driver work better ? Same scenario but with Realtek drivers >>>>>>>>> (r8168) >>>>>>>>> ? Also perhaps compile with -DRTL8169_DEBUG ? >>>>>>>>> >>>>>>>>> Just some thoughts. >>>>>>>>> >>>>>>>>> J. >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/12/2 Raz <[email protected]> >>>>>>>>> >>>>>>>>>> The timeouts happens after the system boots and not while slaves >>>>>>>>>> are in in OP mode. So my transmit is irrelevant here, even though a >>>>>>>>>> transmit happens only from a single thread of through an ioctl ( SDO >>>>>>>>>> reads >>>>>>>>>> and so on..) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Dec 2, 2013 at 11:01 AM, Jeroen Van den Keybus < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 1. why do you disable the rtl8169_phy_timer timer ? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The rtl8169_phy_timer is regularly polled in ec_poll instead. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 2. In rtl_hw_start_8168 : why do disable RTL_W16(IntrMask, >>>>>>>>>>>> tp->intr_event); ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> The drivers are all non-blocking and interrupt-free. All work >>>>>>>>>>> that interrupt handlers normally do is done in ec_poll instead. >>>>>>>>>>> >>>>>>>>>>> If you cannot send packets anymore, I suspect that you may have >>>>>>>>>>> overrun the tx queue, i.e. sent a packet before the previous one >>>>>>>>>>> has been >>>>>>>>>>> completed. You're also not calling the ethercat transmission >>>>>>>>>>> functions from >>>>>>>>>>> different threads, right ? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> thank you >>>>>>>>>>>> raz >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/ >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> etherlab-users mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://lists.etherlab.org/mailman/listinfo/etherlab-users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> https://sites.google.com/site/ironspeedlinux/ >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> https://sites.google.com/site/ironspeedlinux/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> https://sites.google.com/site/ironspeedlinux/ >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> https://sites.google.com/site/ironspeedlinux/ >>>>> >>>> >>>> >>> >>> >>> -- >>> https://sites.google.com/site/ironspeedlinux/ >>> >> >> > > > -- > https://sites.google.com/site/ironspeedlinux/ >
_______________________________________________ etherlab-users mailing list [email protected] http://lists.etherlab.org/mailman/listinfo/etherlab-users
