Hi, after solving my problem with the e1000 driver (http://lists.etherlab.org/pipermail/etherlab-users/2011/001190.html), I can now communicate with the slave devices, send and receive SDOs and use EoE. However, when I tried PDO communication under RTAI, using the RTAI example program, the first thing I got after inserting the module was a kernel bug (reproducible). Afterwards, as usual in such cases, I couldn't unload the module properly, and after some attempts, the system would lock up completely and I had to reboot.
Apr 11 20:34:46 (none) kernel: [344979.737086] ec_rtai_sample: Starting... Apr 11 20:34:46 (none) kernel: [344979.737102] EtherCAT: Requesting master 0... Apr 11 20:34:46 (none) kernel: [344979.737192] EtherCAT: Successfully requested master 0. Apr 11 20:34:46 (none) kernel: [344979.737206] ec_rtai_sample: Registering domain... Apr 11 20:34:46 (none) kernel: [344979.737256] ec_rtai_sample: Configuring PDOs... Apr 11 20:34:46 (none) kernel: [344979.737293] ec_rtai_sample: Registering PDO entries... Apr 11 20:34:46 (none) kernel: [344979.737328] ec_rtai_sample: Activating master... Apr 11 20:34:46 (none) kernel: [344979.737362] EtherCAT: Domain0: Logical address 0x00000000, 7 byte, expected working counter 3. Apr 11 20:34:46 (none) kernel: [344979.737387] EtherCAT: Datagram domain0-0: Logical offset 0x00000000, 7 byte, type LRW. Apr 11 20:34:46 (none) kernel: [344979.737411] EtherCAT: Stopping EoE processing. Apr 11 20:34:46 (none) kernel: [344979.737486] EtherCAT: Master thread exited. Apr 11 20:34:46 (none) kernel: [344979.737519] EtherCAT: Starting EtherCAT-OP thread. Apr 11 20:34:46 (none) kernel: [344979.737559] EtherCAT: Starting EoE processing. Apr 11 20:34:46 (none) kernel: [344979.737573] ec_rtai_sample: Starting cyclic sample thread... Apr 11 20:34:46 (none) kernel: [344979.737593] ec_rtai_sample: RT timer started with 597/597 ticks. Apr 11 20:34:46 (none) kernel: [344979.737609] ec_rtai_sample: Initialized. Apr 11 20:34:46 (none) kernel: [344979.738106] Apr 11 20:34:46 (none) kernel: [344979.738107] LXRT CHANGED MODE (TRAP), PID = 3360, VEC = 6, SIGNO = 4. Apr 11 20:34:46 (none) kernel: [344979.738146] ------------[ cut here ]------------ Apr 11 20:34:46 (none) kernel: [344979.738161] Kernel BUG at c013a453 [verbose debug info unavailable] Apr 11 20:34:46 (none) kernel: [344979.738178] invalid opcode: 0000 [#1] Apr 11 20:34:46 (none) kernel: [344979.738193] Modules linked in: ec_rtai_sample(F) ec_e1000 ec_master xt_tcpudp ipt_MASQUERADE ipv6 af_packet ipt_REJECT xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables rtai_math rtai_fifos dm_mod loop rt_e1000_new rtnet snd_hda_intel rtai_rtdm snd_pcm_oss snd_mixer_oss rtai_sem rtai_lxrt rtai_hal snd_pcm snd_timer snd_page_alloc snd_hwdep serio_raw snd i2c_i801 psmouse heci i2c_core e1000e intel_agp agpgart pcspkr soundcore evdev ext3 jbd mbcache sg sd_mod usbhid hid ata_generic ata_piix r8169 pata_jmicron libata scsi_mod ehci_hcd uhci_hcd usbcore fuse Apr 11 20:34:46 (none) kernel: [344979.738403] Apr 11 20:34:46 (none) kernel: [344979.738415] Pid: 3360, comm: U:HARD:0:14 Tainted: PF (2.6.24-16-rtai #1) Apr 11 20:34:46 (none) kernel: [344979.738440] EIP: 0060:[<c013a453>] EFLAGS: 00010202 CPU: 0 Apr 11 20:34:46 (none) kernel: [344979.738459] EIP is at __ipipe_restore_root+0xc/0x22 Apr 11 20:34:46 (none) kernel: [344979.738475] EAX: 00000001 EBX: c02f5b80 ECX: 00000000 EDX: c020cb1c Apr 11 20:34:46 (none) kernel: [344979.738496] ESI: 00000000 EDI: f7f75f00 EBP: c02f5bec ESP: dfa09e90 Apr 11 20:34:46 (none) kernel: [344979.738511] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Apr 11 20:34:46 (none) kernel: [344979.738526] Process U:HARD:0:14 (pid: 3360, ti=dfa08000 task=dfbfaac0 task.ti=dfa08000)<0> Apr 11 20:34:46 (none) kernel: [344979.738542] I-pipe domain Linux Apr 11 20:34:46 (none) kernel: [344979.738554] Stack: c0153ee9 f7807cd0 c02dd400 f7f75f00 00000020 f7c6d020 0000004c c02f5b80 Apr 11 20:34:46 (none) kernel: [344979.738589] 0000002e 00000020 c020cb1c f7f1be08 00000000 f7f1be08 df909058 0000003c Apr 11 20:34:46 (none) kernel: [344979.738625] 0000002e 0000003c f8eaf45d ffffffff df909084 f7fe7812 00000011 df909058 Apr 11 20:34:46 (none) kernel: [344979.738660] Call Trace: Apr 11 20:34:46 (none) kernel: [344979.738680] [<c0153ee9>] kmem_cache_alloc+0x6e/0xa6 Apr 11 20:34:46 (none) kernel: [344979.738698] [<c020cb1c>] __alloc_skb+0x2d/0x10c Apr 11 20:34:46 (none) kernel: [344979.738714] [<f8eaf45d>] ec_debug_send+0x31/0x17b [ec_master] Apr 11 20:34:46 (none) kernel: [344979.738738] [<f8ea37fd>] ecdev_receive+0x48/0x5b [ec_master] Apr 11 20:34:46 (none) kernel: [344979.738761] [<f8832887>] e1000_clean_rx_irq+0x2b8/0x4cc [ec_e1000] Apr 11 20:34:46 (none) kernel: [344979.738782] [<f8832637>] e1000_clean_rx_irq+0x68/0x4cc [ec_e1000] Apr 11 20:34:46 (none) kernel: [344979.738803] [<f88325cf>] e1000_clean_rx_irq+0x0/0x4cc [ec_e1000] Apr 11 20:34:46 (none) kernel: [344979.738823] [<f882dde0>] e1000_intr+0xc9/0x15c [ec_e1000] Apr 11 20:34:46 (none) kernel: [344979.738842] [<f8ea3635>] ec_device_poll+0x10/0x11 [ec_master] Apr 11 20:34:46 (none) kernel: [344979.738863] [<f8eaaa14>] ecrt_master_receive+0x11/0xca [ec_master] Apr 11 20:34:46 (none) kernel: [344979.738886] [<f8a374c2>] rt_schedule+0x3ca/0x742 [rtai_lxrt] Apr 11 20:34:46 (none) kernel: [344979.738912] [<f890e217>] run+0x26/0xcb [ec_rtai_sample] Apr 11 20:34:46 (none) kernel: [344979.738928] [<f8a39a4a>] kthread_fun+0x113/0x181 [rtai_lxrt] Apr 11 20:34:46 (none) kernel: [344979.738952] [<f8a39937>] kthread_fun+0x0/0x181 [rtai_lxrt] Apr 11 20:34:46 (none) kernel: [344979.738973] [<c0104087>] kernel_thread_helper+0x7/0x10 Apr 11 20:34:46 (none) kernel: [344979.738989] ======================= Apr 11 20:34:46 (none) kernel: [344979.739002] Code: 0b eb fe fa 0f ba 35 a4 14 2e c0 00 83 3d a8 14 2e c0 00 74 08 83 c8 ff e8 f4 fb ff ff fb c3 81 3d 24 19 2e c0 80 31 38 c0 74 04 <0f> 0b eb fe 85 c0 74 09 0f ba 2d a4 14 2e c0 00 c3 e9 b2 ff ff Apr 11 20:34:46 (none) kernel: [344979.739129] EIP: [<c013a453>] __ipipe_restore_root+0xc/0x22 SS:ESP 0068:dfa09e90 Apr 11 20:34:46 (none) kernel: [344979.739351] ---[ end trace 97ed01d355d65d2b ]--- Apr 11 20:35:06 (none) kernel: [344999.105184] ec_rtai_sample: Stopping... Apr 11 20:35:06 (none) kernel: [344999.105230] EtherCAT: Releasing master 0... Apr 11 20:35:06 (none) kernel: [344999.105268] EtherCAT: Stopping EoE processing. Apr 11 20:35:06 (none) kernel: [344999.105399] EtherCAT: Master thread exited. Apr 11 20:35:06 (none) kernel: [344999.105460] EtherCAT: Starting EtherCAT-IDLE thread. Apr 11 20:35:06 (none) kernel: [344999.105525] EtherCAT: Starting EoE processing. Apr 11 20:35:06 (none) kernel: [344999.105563] EtherCAT: Released master 0. Apr 11 20:35:06 (none) kernel: [344999.105600] ec_rtai_sample: Unloading. Apr 11 20:35:06 (none) kernel: [344999.114247] EtherCAT DEBUG: Slave 1 is not configured. Apr 11 20:35:06 (none) kernel: [344999.149701] EtherCAT DEBUG: Slave 0 is not configured. Fortunately, the call trace quickly showed what went wrong: The debug network interface tried to allocate some memory which crashes when called from the cyclic RTAI task. I suppose that's what this paragraph in the manual alludes to: "Attention The socket buffers needed for the operation of debug interfaces have to be allocated dynamically. Some Linux realtime extensions do not allow this in realtime context!" BTW, I think that's a little understatement. If something is not allowed, I'd expect to get some kind of error message rather than a system lockup. But I guess that's just a matter of wording in the manual, since the problem itself isn't going away. Anyway, since I need to use the debug device to analyze SDO traffic, and I really don't like to reboot every time I forget to shut it down before starting PDO transfers, I implemented the following workaround: A flag to temporarily disable the debug device, which can be set by ec_debug_disable(). I've modified rtai_sample.c to do this in the cyclic task. Of course, one still cannot analyze PDO packets this way, but at least other packets without crashing. --- ethercat-1.4.0/include/ecrt.h.orig 2008-12-29 16:27:39.000000000 +0100 +++ ethercat-1.4.0/include/ecrt.h 2011-04-12 14:16:02.000000000 +0200 @@ -897,6 +897,12 @@ ec_sdo_request_t *req /**< SDO request. */ ); +/** Temporarily disable the debug interface. + */ +void ec_debug_disable( + int disable /**< 1 to disable, 0 to re-enable. */ + ); + /****************************************************************************** * Bitwise read/write macros *****************************************************************************/ --- ethercat-1.4.0/master/debug.c.orig 2008-12-29 15:10:27.000000000 +0100 +++ ethercat-1.4.0/master/debug.c 2011-04-12 14:05:34.000000000 +0200 @@ -39,6 +39,8 @@ /*****************************************************************************/ +static int ec_debug_disabled = 0; + // net_device functions int ec_dbgdev_open(struct net_device *); int ec_dbgdev_stop(struct net_device *); @@ -120,7 +122,7 @@ { struct sk_buff *skb; - if (!dbg->opened) return; + if (!dbg->opened || ec_debug_disabled) return; // allocate socket buffer if (!(skb = dev_alloc_skb(size))) { @@ -142,6 +144,17 @@ netif_rx(skb); } +/*****************************************************************************/ + +/** + Temporarily disable the debug interface. +*/ + +void ec_debug_disable(int disable) +{ + ec_debug_disabled = disable; +} + /****************************************************************************** * NET_DEVICE functions *****************************************************************************/ @@ -203,3 +216,11 @@ } /*****************************************************************************/ + +/** \cond */ + +EXPORT_SYMBOL(ec_debug_disable); + +/** \endcond */ + +/*****************************************************************************/ --- ethercat-1.4.0/examples/rtai/rtai_sample.c.orig 2008-12-29 16:19:16.000000000 +0100 +++ ethercat-1.4.0/examples/rtai/rtai_sample.c 2011-04-12 14:11:20.000000000 +0200 @@ -204,8 +204,12 @@ // receive process data rt_sem_wait(&master_sem); + // disable the debug interface which is not RTAI-safe + ec_debug_disable(1); ecrt_master_receive(master); ecrt_domain_process(domain1); + // re-enable the debug interface + ec_debug_disable(0); rt_sem_signal(&master_sem); // check process data state (optional) @@ -230,8 +234,12 @@ EC_WRITE_U8(domain1_pd + off_dig_out, blink ? 0x06 : 0x09); rt_sem_wait(&master_sem); + // disable the debug interface which is not RTAI-safe + ec_debug_disable(1); ecrt_domain_queue(domain1); ecrt_master_send(master); + // re-enable the debug interface + ec_debug_disable(0); rt_sem_signal(&master_sem); rt_task_wait_period(); Another thing: When I looked through the code for other problematic places WRT possible memory allocation in the cyclic task, I found the following where I think a check for adapter->ecdev is needed. I only looked at the e1000 driver for 2.6.24 because that's what we will be using. The issue may also exist in other versions. --- ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c.orig 2011-03-24 18:27:40.000000000 +0100 +++ ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c 2011-04-12 13:13:16.000000000 +0200 @@ -3386,7 +3386,8 @@ if (!__pskb_pull_tail(skb, pull_size)) { DPRINTK(DRV, ERR, "__pskb_pull_tail failed.\n"); - dev_kfree_skb_any(skb); + if (!adapter->ecdev) + dev_kfree_skb_any(skb); return NETDEV_TX_OK; } len = skb->len - skb->data_len; Regards, Frank -- Dipl.-Math. Frank Heckenbach <f.heckenb...@fh-soft.de> Systemprogrammierung, EDV-Beratung Stubenlohstr. 6, 91052 Erlangen, Deutschland Tel.: +49-9131-21359 _______________________________________________ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users