Jitesh: You are indeed correct: the current code will not allow that. This will be addressed in the 1.3 release. Thanks for pointing this out!
Will > On Nov 6, 2017, at 5:23 PM, Jitesh Shah <[email protected]> wrote: > > Alright, > I think I have an idea of whats going on here. > > So the structure of my code is like this -> > os_mutex_pend(); > .. trigger operation here .. > os_sem_pend(); // Wait for operation to complete. ISR calls > os_sem_release() > os_mutex_release(); > > First of all following variables from task object are shared between mutex > and semaphore implementations: t_obj, t_flags, t_lockcnt, t_prio. This > basically guarantees that one *cannot* nest mutex and semaphore operations. > Overwriting t_obj on sem_pend() basically guarantees that current task will > never be off the mutex pending list. > > Secondly, I have some fundamental doubts about the semaphore implementation: > 1) os_sem_release() code has a snippet like so: > >> if (--current->t_lockcnt == 0) { >> current->t_flags &= ~OS_TASK_FLAG_LOCK_HELD; >> } > > For semaphores, (as opposed to mutexes) the task which is pending on > semaphore might NOT be the task that releases the semaphore. Thus, reducing > the "lockcnt" or adjusting the flags field of the "current" task inside > os_sem_release() doesn't make sense. What are your thoughts on this? > > Jitesh > > On Mon, Nov 6, 2017 at 4:18 PM, marko kiiskila <[email protected]> wrote: > >> I’d start by looking for memory corruption. You could try adding >> guard variables around your send_mutex(), and see if anything >> stomps on them. Another option could be to change mutex_release() to >> write something other than 0 at mu_owner, and then add a conditional >> hardware watchpoint which looks if anything tries to zero out mu_owner >> of your send_mutex. >> >> Good luck with the hunt. >> >>> On Nov 6, 2017, at 3:39 PM, will sanfilippo <[email protected]> wrote: >>> >>> Yeah, Chris is right here. I did not read the email thoroughly enough >> and if what I described happened, the owner would not be NULL. Sorry about >> that. >>> >>> So while it would explain lockcnt and level, it would not explain why >> the owner is NULL, as failing to release the mutex would have the owner set >> to something. >>> >>> >>> >>>> On Nov 6, 2017, at 3:33 PM, Christopher Collins <[email protected]> >> wrote: >>>> >>>> I agree that a mutex should never have a null owner and a nonzero level. >>>> >>>> Unfortunately, my first guess is some form of memory corruption: >>>> it seems like a null value accidentally got written to `mu_owner`. I >>>> could be missing it, but I don't see any logic error in the mutex code >>>> which could cause this. >>>> >>>> Getting to the bottom of this is probably going to be difficult, >>>> especially if it is not easy to reproduce. I don't know how valuable >>>> they are, but my two suggestions are: >>>> >>>> 1. Look at the `.lst` file that newt generates during a build to >>>> determine what object immediately follows the mutex in RAM. Maybe an >>>> errant write intended for this object is clearing the owner field. >>>> >>>> 2. Instrument the code with a bunch of asserts and logs. Maybe you can >>>> catch the problem shortly after it happens. >>>> >>>> Like I said, probably not the most helpful advice, but I don't think >>>> this is going to be an easy one to solve! >>>> >>>> Chris >>>> >>>> On Mon, Nov 06, 2017 at 03:16:06PM -0800, Jitesh Shah wrote: >>>>> Hey wil, >>>>> Are you saying that because "mu_level" is set to 1? >>>>> >>>>> It is set to 1 because the last call to os_mutex_release() failed on >>>>> account of "mu_owner" not matching. Thus, the task that got the mutex >>>>> failed to release it. That explains t_lockcnt and mu_level, right? >>>>> >>>>> Jitesh >>>>> >>>>> On Mon, Nov 6, 2017 at 7:56 AM, will sanfilippo <[email protected]> >> wrote: >>>>> >>>>>> What this looks like to me is that there was a nested pend without the >>>>>> same number of releases. Maybe some path out of some code that is >> rarely >>>>>> hit where a mutex is granted but not released? >>>>>> >>>>>> Just a guess... >>>>>> >>>>>>> On Nov 5, 2017, at 8:26 PM, Jitesh Shah <[email protected]> >> wrote: >>>>>>> >>>>>>> Hey Guys, >>>>>>> I am running v1.0.0 branch (0db6321a75deda126943aa187842da >> 6b977cd1c1). >>>>>>> Seeing some strange mutex behaviour. >>>>>>> >>>>>>> So once in a bazillion times, a mutex fails to release. Here is how >> the >>>>>>> structure looks like when it fails: >>>>>>> >>>>>>>> (gdb) p/x send_mutex >>>>>>>> $1 = {mu_head = {slh_first = 0x0}, _pad = 0x0, mu_prio = 0x1, >> mu_level = >>>>>>>> 0x1, mu_owner = 0x0} >>>>>>> >>>>>>> >>>>>>> Why is mu_owner set to 0? That causes the os_mutex_release call to >> fail >>>>>>> since the current task doesn't match the owner task anymore. >>>>>>> >>>>>>> The task which holds the mutex looks like this: >>>>>>> >>>>>>>> (gdb) p/x cent_task >>>>>>>> $3 = {t_stackptr = 0x20008a28, t_stacktop = 0x20008ac8, t_stacksize >> = >>>>>>>> 0x80, t_taskid = 0x6, t_prio = 0x1, t_state = 0x1, t_flags = 0x10, >>>>>>>> t_lockcnt = 0x1, t_pad = 0x0, >>>>>>>> t_name = 0x22378, t_func = 0x90ad, t_arg = 0x0, t_obj = 0x0, >>>>>>>> t_sanity_check = {sc_checkin_last = 0x0, sc_checkin_itvl = 0x0, >> sc_func >>>>>> = >>>>>>>> 0x0, sc_arg = 0x0, sc_next = { >>>>>>>> sle_next = 0x0}}, t_next_wakeup = 0x0, t_run_time = 0x0, >>>>>>>> t_ctx_sw_cnt = 0x213d, t_os_task_list = {stqe_next = 0x0}, >> t_os_list = >>>>>>>> {tqe_next = 0x20001338, >>>>>>>> tqe_prev = 0x200001a8}, t_obj_list = {sle_next = 0x0}} >>>>>>> >>>>>>> >>>>>>> Comparing t_prio and mu_prio, this confirms that this task is indeed >>>>>>> holding the mutex (no other task is waiting on the mutex). >>>>>>> >>>>>>> What can happen that set mu_owner to 0? My original theory was that >> if a >>>>>>> mutex_pend was called from an interrupt context, mu_owner would be >> 0. But >>>>>>> in this case, the only task that is calling mutex is running an >> eventq, >>>>>> so >>>>>>> that is unlikely. >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> Jitesh >>>>>>> >>>>>>> -- >>>>>>> This email including attachments contains Mad Apparel, Inc. DBA Athos >>>>>>> privileged, confidential, and proprietary information solely for the >> use >>>>>>> for the addressed recipients. If you are not the intended recipient, >>>>>> please >>>>>>> be aware that any review, disclosure, copying, distribution, or use >> of >>>>>> the >>>>>>> contents of this message is strictly prohibited. If you have received >>>>>> this >>>>>>> in error, please delete it immediately and notify the sender. All >> rights >>>>>>> reserved by Mad Apparel, Inc. 2012. The information contained herein >> is >>>>>> the >>>>>>> exclusive property of Mad Apparel, Inc. and should not be used, >>>>>>> distributed, reproduced, or disclosed in whole or in part without >> prior >>>>>>> written permission of Mad Apparel, Inc. >>>>>> >>>>>> >>>>> >>>>> -- >>>>> This email including attachments contains Mad Apparel, Inc. DBA Athos >>>>> privileged, confidential, and proprietary information solely for the >> use >>>>> for the addressed recipients. If you are not the intended recipient, >> please >>>>> be aware that any review, disclosure, copying, distribution, or use of >> the >>>>> contents of this message is strictly prohibited. If you have received >> this >>>>> in error, please delete it immediately and notify the sender. All >> rights >>>>> reserved by Mad Apparel, Inc. 2012. The information contained herein >> is the >>>>> exclusive property of Mad Apparel, Inc. and should not be used, >>>>> distributed, reproduced, or disclosed in whole or in part without prior >>>>> written permission of Mad Apparel, Inc. >>> >> >> > > -- > This email including attachments contains Mad Apparel, Inc. DBA Athos > privileged, confidential, and proprietary information solely for the use > for the addressed recipients. If you are not the intended recipient, please > be aware that any review, disclosure, copying, distribution, or use of the > contents of this message is strictly prohibited. If you have received this > in error, please delete it immediately and notify the sender. All rights > reserved by Mad Apparel, Inc. 2012. The information contained herein is the > exclusive property of Mad Apparel, Inc. and should not be used, > distributed, reproduced, or disclosed in whole or in part without prior > written permission of Mad Apparel, Inc.
