[meta-freescale] i.mx6sx linux-fslc sema4 bug

Petr Kubizňák Wed, 11 Oct 2017 07:53:28 -0700

This is an official report of the issue originally reported athttps://community.nxp.com/thread/461521 .


   How to reproduce

I hit on a problem in linux-fslc 4.1.38 (default in yocto-morty),available at GitHub - Freescale/linux-fslc: Linux kernel source tree<https://community.nxp.com/external-link.jspa?url=https%3A%2F%2Fgithub.com%2FFreescale%2Flinux-fslc%2Ftree%2F4.1-2.0.x-imx>. It is related to the sema4 mutex locking mechanism(drivers/char/imx_amp/imx_sema4.c). The problem only arises when the M4core is running and its RDC domain is set. I'm running a trivialFreeRTOS program based on the hello_world demo from the NXP's officialdistribution(https://www.nxp.com/webapp/Download?colCode=FreeRTOS_MX6SX_1.0.1_WIN&appType=license&Parent_nodeId=142289606182472763482…<https://community.nxp.com/external-link.jspa?url=https%3A%2F%2Fwww.nxp.com%2Fwebapp%2FDownload%3FcolCode%3DFreeRTOS_MX6SX_1.0.1_WIN%26appType%3Dlicense%26Parent_nodeId%3D1422896061824727634827%26Parent_pageType%3Dproduct>). The program simply sets the RDC domain and loops forever:


|RDC_SetDomainID(RDC, rdcMdaM4, BOARD_DOMAIN_ID, false); while (1) ;|

With Yocto (Morty), I built the qt4e-demo-image (fslc-framebufferdistro). When I boot the image, qtdemoE demo starts. The problem can bereproduced with an intensive I2C communication. In my case, I touch onthe screen wildly, generating the touch events. My interrupt-driventouchscreen driver calls i2c_smbus_read_byte_data() to read the data.



   What it does

After a while, the following message starts printing in the terminalinfinitely:


drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 137 already locked, 
wait! num 6 val 1

The FreeRTOS on the M4 core continues without problem. If I stop thedebugger and check the SEMA4_Gate06 register, its value is set to 1,indicating it is locked by the A9 core.



   Workaround

After spending hours on this problem, I formed a testing patch (seeattached). In the _imx_sema4_mutex_lock() function, it disables theerror log for the first 5 occurences. As a result, instead of beingimprisoned in the infinite error message loop, I just get a singlemessage and everything seems to work fine:


286 Trying to unlock an unlock mutex.

When this repeats 5-times (which might take a while), the system entersthe infinite message loop. Including my added prints, the output looksas follows:


286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
286 Trying to unlock an unlock mutex.
drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 145 already locked, 
wait! num 6 val 1. 6
CPU: 0 PID: 66 Comm: irq/203-crtouch Tainted: G           O    
4.1.38-fslc-01914-gb3f76d9ae87e-dirty #12
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800184e0>] (unwind_backtrace) from [<80013cfc>] (show_stack+0x20/0x24)
[<80013cfc>] (show_stack) from [<8091b2ec>] (dump_stack+0x80/0x94)
[<8091b2ec>] (dump_stack) from [<80406be8>] (_imx_sema4_mutex_lock+0x11c/0x1c4)
[<80406be8>] (_imx_sema4_mutex_lock) from [<80406d58>] 
(imx_sema4_mutex_lock+0xa4/0x128)
[<80406d58>] (imx_sema4_mutex_lock) from [<80025738>] 
(clk_gate2_do_shared_clks+0x78/0xe4)
[<80025738>] (clk_gate2_do_shared_clks) from [<800258b8>] 
(clk_gate2_disable+0x58/0x7c)
[<800258b8>] (clk_gate2_disable) from [<806b6838>] (clk_core_disable+0x64/0x1e4)
[<806b6838>] (clk_core_disable) from [<806b7630>] (clk_disable+0x34/0x40)
[<806b7630>] (clk_disable) from [<805d2eec>] (i2c_imx_xfer+0x194/0xe28)
[<805d2eec>] (i2c_imx_xfer) from [<805ce5a0>] (__i2c_transfer+0x160/0x678)
[<805ce5a0>] (__i2c_transfer) from [<805ceb2c>] (i2c_transfer+0x74/0xa0)
[<805ceb2c>] (i2c_transfer) from [<805cf1a4>] (i2c_smbus_xfer+0x590/0x9d8)
[<805cf1a4>] (i2c_smbus_xfer) from [<805cf7ec>] 
(i2c_smbus_read_byte_data+0x3c/0x4c)
[<805cf7ec>] (i2c_smbus_read_byte_data) from [<805c07d0>] 
(crtouch_ts_interrupt+0x90/0x14c)
[<805c07d0>] (crtouch_ts_interrupt) from [<8007e824>] (irq_thread_fn+0x2c/0x50)
[<8007e824>] (irq_thread_fn) from [<8007eb80>] (irq_thread+0x13c/0x188)
[<8007eb80>] (irq_thread) from [<800576b4>] (kthread+0xec/0x104)
[<800576b4>] (kthread) from [<8000fca8>] (ret_from_fork+0x14/0x2c)
l : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
lo: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
k : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
ko: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
u : 97282, 8852a240, clk_gate2_do_shared_clks+0xa4/0xe4
uo: 97180, 8852a240, clk_pllv3_do_shared_clks+0xa4/0xe4
drivers/char/imx_amp/imx_sema4.c -> _imx_sema4_mutex_lock 145 already locked, 
wait! num 6 val 1. 7
CPU: 0 PID: 66 Comm: irq/203-crtouch Tainted: G           O    
4.1.38-fslc-01914-gb3f76d9ae87e-dirty #12
Hardware name: Freescale i.MX6 SoloX (Device Tree)
[<800184e0>] (unwind_backtrace) from [<80013cfc>] (show_stack+0x20/0x24)
[<80013cfc>] (show_stack) from [<8091b2ec>] (dump_stack+0x80/0x94)
[<8091b2ec>] (dump_stack) from [<80406be8>] (_imx_sema4_mutex_lock+0x11c/0x1c4)
[<80406be8>] (_imx_sema4_mutex_lock) from [<80406d88>] 
(imx_sema4_mutex_lock+0xd4/0x128)
[<80406d88>] (imx_sema4_mutex_lock) from [<80025738>] 
(clk_gate2_do_shared_clks+0x78/0xe4)
[<80025738>] (clk_gate2_do_shared_clks) from [<800258b8>] 
(clk_gate2_disable+0x58/0x7c)
[<800258b8>] (clk_gate2_disable) from [<806b6838>] (clk_core_disable+0x64/0x1e4)
[<806b6838>] (clk_core_disable) from [<806b7630>] (clk_disable+0x34/0x40)
[<806b7630>] (clk_disable) from [<805d2eec>] (i2c_imx_xfer+0x194/0xe28)
[<805d2eec>] (i2c_imx_xfer) from [<805ce5a0>] (__i2c_transfer+0x160/0x678)
[<805ce5a0>] (__i2c_transfer) from [<805ceb2c>] (i2c_transfer+0x74/0xa0)
[<805ceb2c>] (i2c_transfer) from [<805cf1a4>] (i2c_smbus_xfer+0x590/0x9d8)
[<805cf1a4>] (i2c_smbus_xfer) from [<805cf7ec>] 
(i2c_smbus_read_byte_data+0x3c/0x4c)
[<805cf7ec>] (i2c_smbus_read_byte_data) from [<805c07d0>] 
(crtouch_ts_interrupt+0x90/0x14c)
[<805c07d0>] (crtouch_ts_interrupt) from [<8007e824>] (irq_thread_fn+0x2c/0x50)
[<8007e824>] (irq_thread_fn) from [<8007eb80>] (irq_thread+0x13c/0x188)
[<8007eb80>] (irq_thread) from [<800576b4>] (kthread+0xec/0x104)
[<800576b4>] (kthread) from [<8000fca8>] (ret_from_fork+0x14/0x2c)
l : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
lo: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
k : 97284, 8852a240, clk_gate2_do_shared_clks+0x78/0xe4
ko: 97284, 8852a240, clk_pllv3_do_shared_clks+0x78/0xe4
u : 97282, 8852a240, clk_gate2_do_shared_clks+0xa4/0xe4
uo: 97180, 8852a240, clk_pllv3_do_shared_clks+0xa4/0xe4
... (repeats infinitely) ...


   The problem

The output shows the problem is in calling the imx_sema4_mutex_lock()function from two places at once (clk_gate2_do_shared_clks() andclk_pllv3_do_shared_clks()). One comes from the touchscreen interrupthandler, second comes from some worker thread, but I don't know moredetails about it.

It is obvious that printing anything to the console from an interrupthandler is not okay. So I suggest to disable all debug/error prints inthe /imx_sema4.c/ file. Then the infinite message loop would be avoided.

The question is, *what is the root cause*. Is the debug print really theproblem (i.e. it is a valid situation that mutex_ptr->gate_val ==SEMA4_A9_LOCK), or does it just expose a problem hidden somewhere else(i.e. the debug print should actually never appear).

As the aforementioned output shows, the condition (mutex_ptr->gate_val!= SEMA4_A9_LOCK) in imx_sema4_mutex_unlock() holds true when theproblematic situation arises. I guess it is because the gate_val isalready 0, as the unlock function has been called twice, whichcorresponds to the two locking calls. That suggests the lockingmechanism does not work correctly.


Is my reasoning correct? How to properly fix it?


   Workaround 2

Recently I found that the /evbug/ kernel module might be incident withthe issue. When I rmmod evbug, I never see the issue appear. When Iinsmod evbug again, the issue reappears. I don't think it's the rootcause, though. Is it?



Petr

Petr Kubiznak
Embedded SW Engineer
ELNICO s.r.o.
Heydukova 2923, 544 01 Dvur Kralove n.L., Czech Republic
T: +420 499 321 019 | M: +420 724 053 668

www.elnico.cz <http://www.elnico.cz/> | www.sqm4.com<http://www.sqm4.com> | [email protected]<mailto:[email protected]>

diff --git a/drivers/char/imx_amp/imx_sema4.c b/drivers/char/imx_amp/imx_sema4.c
index 412202f11cbb..228626b12e07 100644
--- a/drivers/char/imx_amp/imx_sema4.c
+++ b/drivers/char/imx_amp/imx_sema4.c
@@ -105,6 +105,13 @@ int imx_sema4_mutex_destroy(struct imx_sema4_mutex *mutex_ptr)
 }
 EXPORT_SYMBOL(imx_sema4_mutex_destroy);
 
+typedef struct debug_data_struct {
+  void * mutex;
+  void * addr;
+  int cnt;
+};
+static struct debug_data_struct _l, _lo, _k, _ko, _u, _uo;
+
 /*!
  * \brief Lock the mutex, shouldn't be interruted by INT.
  *
@@ -119,6 +126,7 @@ EXPORT_SYMBOL(imx_sema4_mutex_destroy);
  *
  * \see imx_sema4_mutex_unlock
  */
+static int err_cnt = 0;
 int _imx_sema4_mutex_lock(struct imx_sema4_mutex *mutex_ptr)
 {
 	int ret = 0, i = 0;
@@ -130,11 +138,18 @@ int _imx_sema4_mutex_lock(struct imx_sema4_mutex *mutex_ptr)
 	mutex_ptr->gate_val = readb(imx6_sema4->ioaddr + i);
 	mutex_ptr->gate_val &= SEMA4_GATE_MASK;
 	/* Check to see if this core already own it */
-	if (mutex_ptr->gate_val == SEMA4_A9_LOCK) {
+	if ((mutex_ptr->gate_val == SEMA4_A9_LOCK) && (++err_cnt > 5)) {
 		/* return -EBUSY, invoker should be in sleep, and re-lock ag */
-		pr_err("%s -> %s %d already locked, wait! num %d val %d.\n",
+		pr_err("%s -> %s %d already locked, wait! num %d val %d. %d\n",
 				__FILE__, __func__, __LINE__,
-				i, mutex_ptr->gate_val);
+				i, mutex_ptr->gate_val, err_cnt);
+		dump_stack();
+		pr_err("l : %d, %p, %pS\n", _l .cnt, _l .mutex, _l .addr);
+		pr_err("lo: %d, %p, %pS\n", _lo.cnt, _lo.mutex, _lo.addr);
+		pr_err("k : %d, %p, %pS\n", _k .cnt, _k .mutex, _k .addr);
+		pr_err("ko: %d, %p, %pS\n", _ko.cnt, _ko.mutex, _ko.addr);
+		pr_err("u : %d, %p, %pS\n", _u .cnt, _u .mutex, _u .addr);
+		pr_err("uo: %d, %p, %pS\n", _uo.cnt, _uo.mutex, _uo.addr);
 		ret = -EBUSY;
 		goto out;
 	} else {
@@ -205,7 +220,22 @@ int imx_sema4_mutex_lock(struct imx_sema4_mutex *mutex_ptr)
 	int ret = 0;
 	unsigned long flags;
 
+struct debug_data_struct * _d = &_l;
+struct debug_data_struct * _do = &_lo;
+struct debug_data_struct * _e = &_k;
+struct debug_data_struct * _eo = &_ko;
+
+if ((++_d->cnt > 1000) && ((_d->mutex != mutex_ptr) || (_d->addr != __builtin_return_address(0)))) {
+  *_do = *_d;
+  _d->mutex = mutex_ptr;
+  _d->addr = __builtin_return_address(0);
+}
 	spin_lock_irqsave(&imx6_sema4->lock, flags);
+if ((++_e->cnt > 1000) && ((_e->mutex != mutex_ptr) || (_e->addr != __builtin_return_address(0)))) {
+  *_eo = *_e;
+  _e->mutex = mutex_ptr;
+  _e->addr = __builtin_return_address(0);
+}
 	ret = _imx_sema4_mutex_lock(mutex_ptr);
 	spin_unlock_irqrestore(&imx6_sema4->lock, flags);
 	while (-EBUSY == ret) {
@@ -237,6 +267,14 @@ int imx_sema4_mutex_unlock(struct imx_sema4_mutex *mutex_ptr)
 {
 	int ret = 0, i = 0;
 
+struct debug_data_struct * _d = &_u;
+struct debug_data_struct * _do = &_uo;
+
+if ((++_d->cnt > 1000) && ((_d->mutex != mutex_ptr) || (_d->addr != __builtin_return_address(0)))) {
+  *_do = *_d;
+  _d->mutex = mutex_ptr;
+  _d->addr = __builtin_return_address(0);
+}
 	if ((mutex_ptr == NULL) || (mutex_ptr->valid != CORE_MUTEX_VALID))
 		return -EINVAL;

-- 
_______________________________________________
meta-freescale mailing list
[email protected]
https://lists.yoctoproject.org/listinfo/meta-freescale

[meta-freescale] i.mx6sx linux-fslc sema4 bug

Reply via email to