Hi,

after solving my problem with the e1000 driver
(http://lists.etherlab.org/pipermail/etherlab-users/2011/001190.html),
I can now communicate with the slave devices, send and receive SDOs
and use EoE. However, when I tried PDO communication under RTAI,
using the RTAI example program, the first thing I got after
inserting the module was a kernel bug (reproducible). Afterwards, as
usual in such cases, I couldn't unload the module properly, and
after some attempts, the system would lock up completely and I had
to reboot.

Apr 11 20:34:46 (none) kernel: [344979.737086] ec_rtai_sample: Starting...
Apr 11 20:34:46 (none) kernel: [344979.737102] EtherCAT: Requesting master 0...
Apr 11 20:34:46 (none) kernel: [344979.737192] EtherCAT: Successfully requested 
master 0.
Apr 11 20:34:46 (none) kernel: [344979.737206] ec_rtai_sample: Registering 
domain...
Apr 11 20:34:46 (none) kernel: [344979.737256] ec_rtai_sample: Configuring 
PDOs...
Apr 11 20:34:46 (none) kernel: [344979.737293] ec_rtai_sample: Registering PDO 
entries...
Apr 11 20:34:46 (none) kernel: [344979.737328] ec_rtai_sample: Activating 
master...
Apr 11 20:34:46 (none) kernel: [344979.737362] EtherCAT: Domain0: Logical 
address 0x00000000, 7 byte, expected working counter 3.
Apr 11 20:34:46 (none) kernel: [344979.737387] EtherCAT:   Datagram domain0-0: 
Logical offset 0x00000000, 7 byte, type LRW.
Apr 11 20:34:46 (none) kernel: [344979.737411] EtherCAT: Stopping EoE 
processing.
Apr 11 20:34:46 (none) kernel: [344979.737486] EtherCAT: Master thread exited.
Apr 11 20:34:46 (none) kernel: [344979.737519] EtherCAT: Starting EtherCAT-OP 
thread.
Apr 11 20:34:46 (none) kernel: [344979.737559] EtherCAT: Starting EoE 
processing.
Apr 11 20:34:46 (none) kernel: [344979.737573] ec_rtai_sample: Starting cyclic 
sample thread...
Apr 11 20:34:46 (none) kernel: [344979.737593] ec_rtai_sample: RT timer started 
with 597/597 ticks.
Apr 11 20:34:46 (none) kernel: [344979.737609] ec_rtai_sample: Initialized.
Apr 11 20:34:46 (none) kernel: [344979.738106]
Apr 11 20:34:46 (none) kernel: [344979.738107] LXRT CHANGED MODE (TRAP), PID = 
3360, VEC = 6, SIGNO = 4.
Apr 11 20:34:46 (none) kernel: [344979.738146] ------------[ cut here 
]------------
Apr 11 20:34:46 (none) kernel: [344979.738161] Kernel BUG at c013a453 [verbose 
debug info unavailable]
Apr 11 20:34:46 (none) kernel: [344979.738178] invalid opcode: 0000 [#1]
Apr 11 20:34:46 (none) kernel: [344979.738193] Modules linked in: 
ec_rtai_sample(F) ec_e1000 ec_master xt_tcpudp ipt_MASQUERADE ipv6 af_packet 
ipt_REJECT xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack 
iptable_filter ip_tables x_tables rtai_math rtai_fifos dm_mod loop rt_e1000_new 
rtnet snd_hda_intel rtai_rtdm snd_pcm_oss snd_mixer_oss rtai_sem rtai_lxrt 
rtai_hal snd_pcm snd_timer snd_page_alloc snd_hwdep serio_raw snd i2c_i801 
psmouse heci i2c_core e1000e intel_agp agpgart pcspkr soundcore evdev ext3 jbd 
mbcache sg sd_mod usbhid hid ata_generic ata_piix r8169 pata_jmicron libata 
scsi_mod ehci_hcd uhci_hcd usbcore fuse
Apr 11 20:34:46 (none) kernel: [344979.738403]
Apr 11 20:34:46 (none) kernel: [344979.738415] Pid: 3360, comm: U:HARD:0:14 
Tainted: PF       (2.6.24-16-rtai #1)
Apr 11 20:34:46 (none) kernel: [344979.738440] EIP: 0060:[<c013a453>] EFLAGS: 
00010202 CPU: 0
Apr 11 20:34:46 (none) kernel: [344979.738459] EIP is at 
__ipipe_restore_root+0xc/0x22
Apr 11 20:34:46 (none) kernel: [344979.738475] EAX: 00000001 EBX: c02f5b80 ECX: 
00000000 EDX: c020cb1c
Apr 11 20:34:46 (none) kernel: [344979.738496] ESI: 00000000 EDI: f7f75f00 EBP: 
c02f5bec ESP: dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.738511]  DS: 007b ES: 007b FS: 0000 GS: 
0033 SS: 0068
Apr 11 20:34:46 (none) kernel: [344979.738526] Process U:HARD:0:14 (pid: 3360, 
ti=dfa08000 task=dfbfaac0 task.ti=dfa08000)<0>
Apr 11 20:34:46 (none) kernel: [344979.738542] I-pipe domain Linux
Apr 11 20:34:46 (none) kernel: [344979.738554] Stack: c0153ee9 f7807cd0 
c02dd400 f7f75f00 00000020 f7c6d020 0000004c c02f5b80
Apr 11 20:34:46 (none) kernel: [344979.738589]        0000002e 00000020 
c020cb1c f7f1be08 00000000 f7f1be08 df909058 0000003c
Apr 11 20:34:46 (none) kernel: [344979.738625]        0000002e 0000003c 
f8eaf45d ffffffff df909084 f7fe7812 00000011 df909058
Apr 11 20:34:46 (none) kernel: [344979.738660] Call Trace:
Apr 11 20:34:46 (none) kernel: [344979.738680]  [<c0153ee9>] 
kmem_cache_alloc+0x6e/0xa6
Apr 11 20:34:46 (none) kernel: [344979.738698]  [<c020cb1c>] 
__alloc_skb+0x2d/0x10c
Apr 11 20:34:46 (none) kernel: [344979.738714]  [<f8eaf45d>] 
ec_debug_send+0x31/0x17b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738738]  [<f8ea37fd>] 
ecdev_receive+0x48/0x5b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738761]  [<f8832887>] 
e1000_clean_rx_irq+0x2b8/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738782]  [<f8832637>] 
e1000_clean_rx_irq+0x68/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738803]  [<f88325cf>] 
e1000_clean_rx_irq+0x0/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738823]  [<f882dde0>] 
e1000_intr+0xc9/0x15c [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738842]  [<f8ea3635>] 
ec_device_poll+0x10/0x11 [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738863]  [<f8eaaa14>] 
ecrt_master_receive+0x11/0xca [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738886]  [<f8a374c2>] 
rt_schedule+0x3ca/0x742 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738912]  [<f890e217>] run+0x26/0xcb 
[ec_rtai_sample]
Apr 11 20:34:46 (none) kernel: [344979.738928]  [<f8a39a4a>] 
kthread_fun+0x113/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738952]  [<f8a39937>] 
kthread_fun+0x0/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738973]  [<c0104087>] 
kernel_thread_helper+0x7/0x10
Apr 11 20:34:46 (none) kernel: [344979.738989]  =======================
Apr 11 20:34:46 (none) kernel: [344979.739002] Code: 0b eb fe fa 0f ba 35 a4 14 
2e c0 00 83 3d a8 14 2e c0 00 74 08 83 c8 ff e8 f4 fb ff ff fb c3 81 3d 24 19 
2e c0 80 31 38 c0 74 04 <0f> 0b eb fe 85 c0 74 09 0f ba 2d a4 14 2e c0 00 c3 e9 
b2 ff ff
Apr 11 20:34:46 (none) kernel: [344979.739129] EIP: [<c013a453>] 
__ipipe_restore_root+0xc/0x22 SS:ESP 0068:dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.739351] ---[ end trace 97ed01d355d65d2b 
]---
Apr 11 20:35:06 (none) kernel: [344999.105184] ec_rtai_sample: Stopping...
Apr 11 20:35:06 (none) kernel: [344999.105230] EtherCAT: Releasing master 0...
Apr 11 20:35:06 (none) kernel: [344999.105268] EtherCAT: Stopping EoE 
processing.
Apr 11 20:35:06 (none) kernel: [344999.105399] EtherCAT: Master thread exited.
Apr 11 20:35:06 (none) kernel: [344999.105460] EtherCAT: Starting EtherCAT-IDLE 
thread.
Apr 11 20:35:06 (none) kernel: [344999.105525] EtherCAT: Starting EoE 
processing.
Apr 11 20:35:06 (none) kernel: [344999.105563] EtherCAT: Released master 0.
Apr 11 20:35:06 (none) kernel: [344999.105600] ec_rtai_sample: Unloading.
Apr 11 20:35:06 (none) kernel: [344999.114247] EtherCAT DEBUG: Slave 1 is not 
configured.
Apr 11 20:35:06 (none) kernel: [344999.149701] EtherCAT DEBUG: Slave 0 is not 
configured.

Fortunately, the call trace quickly showed what went wrong: The
debug network interface tried to allocate some memory which crashes
when called from the cyclic RTAI task. I suppose that's what this
paragraph in the manual alludes to:

  "Attention  The socket buffers needed for the operation of debug
  interfaces have to be allocated dynamically. Some Linux realtime
  extensions do not allow this in realtime context!"

BTW, I think that's a little understatement. If something is not
allowed, I'd expect to get some kind of error message rather than a
system lockup. But I guess that's just a matter of wording in the
manual, since the problem itself isn't going away.

Anyway, since I need to use the debug device to analyze SDO traffic,
and I really don't like to reboot every time I forget to shut it
down before starting PDO transfers, I implemented the following
workaround: A flag to temporarily disable the debug device, which
can be set by ec_debug_disable(). I've modified rtai_sample.c to do
this in the cyclic task. Of course, one still cannot analyze PDO
packets this way, but at least other packets without crashing.

--- ethercat-1.4.0/include/ecrt.h.orig  2008-12-29 16:27:39.000000000 +0100
+++ ethercat-1.4.0/include/ecrt.h       2011-04-12 14:16:02.000000000 +0200
@@ -897,6 +897,12 @@
         ec_sdo_request_t *req /**< SDO request. */
         );
 
+/** Temporarily disable the debug interface.
+ */
+void ec_debug_disable(
+        int disable /**< 1 to disable, 0 to re-enable. */
+        );
+
 /******************************************************************************
  * Bitwise read/write macros
  *****************************************************************************/
--- ethercat-1.4.0/master/debug.c.orig  2008-12-29 15:10:27.000000000 +0100
+++ ethercat-1.4.0/master/debug.c       2011-04-12 14:05:34.000000000 +0200
@@ -39,6 +39,8 @@
 
 /*****************************************************************************/
 
+static int ec_debug_disabled = 0;
+
 // net_device functions
 int ec_dbgdev_open(struct net_device *);
 int ec_dbgdev_stop(struct net_device *);
@@ -120,7 +122,7 @@
 {
     struct sk_buff *skb;
 
-    if (!dbg->opened) return;
+    if (!dbg->opened || ec_debug_disabled) return;
 
     // allocate socket buffer
     if (!(skb = dev_alloc_skb(size))) {
@@ -142,6 +144,17 @@
     netif_rx(skb);
 }
 
+/*****************************************************************************/
+
+/**
+   Temporarily disable the debug interface.
+*/
+
+void ec_debug_disable(int disable)
+{
+    ec_debug_disabled = disable;
+}
+
 /******************************************************************************
  *  NET_DEVICE functions
  *****************************************************************************/
@@ -203,3 +216,11 @@
 }
 
 /*****************************************************************************/
+
+/** \cond */
+
+EXPORT_SYMBOL(ec_debug_disable);
+
+/** \endcond */
+
+/*****************************************************************************/
--- ethercat-1.4.0/examples/rtai/rtai_sample.c.orig     2008-12-29 
16:19:16.000000000 +0100
+++ ethercat-1.4.0/examples/rtai/rtai_sample.c  2011-04-12 14:11:20.000000000 
+0200
@@ -204,8 +204,12 @@
 
         // receive process data
         rt_sem_wait(&master_sem);
+        // disable the debug interface which is not RTAI-safe
+        ec_debug_disable(1);
         ecrt_master_receive(master);
         ecrt_domain_process(domain1);
+        // re-enable the debug interface
+        ec_debug_disable(0);
         rt_sem_signal(&master_sem);
 
         // check process data state (optional)
@@ -230,8 +234,12 @@
         EC_WRITE_U8(domain1_pd + off_dig_out, blink ? 0x06 : 0x09);
 
         rt_sem_wait(&master_sem);
+        // disable the debug interface which is not RTAI-safe
+        ec_debug_disable(1);
         ecrt_domain_queue(domain1);
         ecrt_master_send(master);
+        // re-enable the debug interface
+        ec_debug_disable(0);
         rt_sem_signal(&master_sem);
                
         rt_task_wait_period();

Another thing: When I looked through the code for other problematic
places WRT possible memory allocation in the cyclic task, I found
the following where I think a check for adapter->ecdev is needed. I
only looked at the e1000 driver for 2.6.24 because that's what we
will be using. The issue may also exist in other versions.

--- ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c.orig      
2011-03-24 18:27:40.000000000 +0100
+++ ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c   2011-04-12 
13:13:16.000000000 +0200
@@ -3386,7 +3386,8 @@
                                if (!__pskb_pull_tail(skb, pull_size)) {
                                        DPRINTK(DRV, ERR,
                                                "__pskb_pull_tail failed.\n");
-                                       dev_kfree_skb_any(skb);
+                                       if (!adapter->ecdev)
+                                               dev_kfree_skb_any(skb);
                                        return NETDEV_TX_OK;
                                }
                                len = skb->len - skb->data_len;

Regards,
Frank
 
-- 
Dipl.-Math. Frank Heckenbach <f.heckenb...@fh-soft.de>
Systemprogrammierung, EDV-Beratung
Stubenlohstr. 6, 91052 Erlangen, Deutschland
Tel.: +49-9131-21359
_______________________________________________
etherlab-users mailing list
etherlab-users@etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-users

Reply via email to