Jitesh:

You are indeed correct: the current code will not allow that. This will be 
addressed in the 1.3 release. Thanks for pointing this out!

Will
> On Nov 6, 2017, at 5:23 PM, Jitesh Shah <[email protected]> wrote:
> 
> Alright,
> I think I have an idea of whats going on here.
> 
> So the structure of my code is like this ->
> os_mutex_pend();
> .. trigger operation here ..
> os_sem_pend();  // Wait for operation to complete. ISR calls
> os_sem_release()
> os_mutex_release();
> 
> First of all following variables from task object are shared between mutex
> and semaphore implementations: t_obj, t_flags, t_lockcnt, t_prio. This
> basically guarantees that one *cannot* nest mutex and semaphore operations.
> Overwriting t_obj on sem_pend() basically guarantees that current task will
> never be off the mutex pending list.
> 
> Secondly, I have some fundamental doubts about the semaphore implementation:
> 1) os_sem_release() code has a snippet like so:
> 
>>    if (--current->t_lockcnt == 0) {
>>        current->t_flags &= ~OS_TASK_FLAG_LOCK_HELD;
>>    }
> 
> For semaphores, (as opposed to mutexes) the task which is pending on
> semaphore might NOT be the task that releases the semaphore. Thus, reducing
> the "lockcnt" or adjusting the flags field of the "current" task inside
> os_sem_release() doesn't make sense. What are your thoughts on this?
> 
> Jitesh
> 
> On Mon, Nov 6, 2017 at 4:18 PM, marko kiiskila <[email protected]> wrote:
> 
>> I’d start by looking for memory corruption. You could try adding
>> guard variables around your send_mutex(), and see if anything
>> stomps on them. Another option could be to change mutex_release() to
>> write something other than 0 at mu_owner, and then add a conditional
>> hardware watchpoint which looks if anything tries to zero out mu_owner
>> of your send_mutex.
>> 
>> Good luck with the hunt.
>> 
>>> On Nov 6, 2017, at 3:39 PM, will sanfilippo <[email protected]> wrote:
>>> 
>>> Yeah, Chris is right here. I did not read the email thoroughly enough
>> and if what I described happened, the owner would not be NULL. Sorry about
>> that.
>>> 
>>> So while it would explain lockcnt and level, it would not explain why
>> the owner is NULL, as failing to release the mutex would have the owner set
>> to something.
>>> 
>>> 
>>> 
>>>> On Nov 6, 2017, at 3:33 PM, Christopher Collins <[email protected]>
>> wrote:
>>>> 
>>>> I agree that a mutex should never have a null owner and a nonzero level.
>>>> 
>>>> Unfortunately, my first guess is some form of memory corruption:
>>>> it seems like a null value accidentally got written to `mu_owner`.  I
>>>> could be missing it, but I don't see any logic error in the mutex code
>>>> which could cause this.
>>>> 
>>>> Getting to the bottom of this is probably going to be difficult,
>>>> especially if it is not easy to reproduce.  I don't know how valuable
>>>> they are, but my two suggestions are:
>>>> 
>>>> 1. Look at the `.lst` file that newt generates during a build to
>>>> determine what object immediately follows the mutex in RAM.  Maybe an
>>>> errant write intended for this object is clearing the owner field.
>>>> 
>>>> 2. Instrument the code with a bunch of asserts and logs.  Maybe you can
>>>> catch the problem shortly after it happens.
>>>> 
>>>> Like I said, probably not the most helpful advice, but I don't think
>>>> this is going to be an easy one to solve!
>>>> 
>>>> Chris
>>>> 
>>>> On Mon, Nov 06, 2017 at 03:16:06PM -0800, Jitesh Shah wrote:
>>>>> Hey wil,
>>>>> Are you saying that because "mu_level" is set to 1?
>>>>> 
>>>>> It is set to 1 because the last call to os_mutex_release() failed on
>>>>> account of "mu_owner" not matching. Thus, the task that got the mutex
>>>>> failed to release it. That explains t_lockcnt and mu_level, right?
>>>>> 
>>>>> Jitesh
>>>>> 
>>>>> On Mon, Nov 6, 2017 at 7:56 AM, will sanfilippo <[email protected]>
>> wrote:
>>>>> 
>>>>>> What this looks like to me is that there was a nested pend without the
>>>>>> same number of releases. Maybe some path out of some code that is
>> rarely
>>>>>> hit where a mutex is granted but not released?
>>>>>> 
>>>>>> Just a guess...
>>>>>> 
>>>>>>> On Nov 5, 2017, at 8:26 PM, Jitesh Shah <[email protected]>
>> wrote:
>>>>>>> 
>>>>>>> Hey Guys,
>>>>>>> I am running v1.0.0 branch (0db6321a75deda126943aa187842da
>> 6b977cd1c1).
>>>>>>> Seeing some strange mutex behaviour.
>>>>>>> 
>>>>>>> So once in a bazillion times, a mutex fails to release. Here is how
>> the
>>>>>>> structure looks like when it fails:
>>>>>>> 
>>>>>>>> (gdb) p/x send_mutex
>>>>>>>> $1 = {mu_head = {slh_first = 0x0}, _pad = 0x0, mu_prio = 0x1,
>> mu_level =
>>>>>>>> 0x1, mu_owner = 0x0}
>>>>>>> 
>>>>>>> 
>>>>>>> Why is mu_owner set to 0? That causes the os_mutex_release call to
>> fail
>>>>>>> since the current task doesn't match the owner task anymore.
>>>>>>> 
>>>>>>> The task which holds the mutex looks like this:
>>>>>>> 
>>>>>>>> (gdb) p/x cent_task
>>>>>>>> $3 = {t_stackptr = 0x20008a28, t_stacktop = 0x20008ac8, t_stacksize
>> =
>>>>>>>> 0x80, t_taskid = 0x6, t_prio = 0x1, t_state = 0x1, t_flags = 0x10,
>>>>>>>> t_lockcnt = 0x1, t_pad = 0x0,
>>>>>>>> t_name = 0x22378, t_func = 0x90ad, t_arg = 0x0, t_obj = 0x0,
>>>>>>>> t_sanity_check = {sc_checkin_last = 0x0, sc_checkin_itvl = 0x0,
>> sc_func
>>>>>> =
>>>>>>>> 0x0, sc_arg = 0x0, sc_next = {
>>>>>>>>   sle_next = 0x0}}, t_next_wakeup = 0x0, t_run_time = 0x0,
>>>>>>>> t_ctx_sw_cnt = 0x213d, t_os_task_list = {stqe_next = 0x0},
>> t_os_list =
>>>>>>>> {tqe_next = 0x20001338,
>>>>>>>> tqe_prev = 0x200001a8}, t_obj_list = {sle_next = 0x0}}
>>>>>>> 
>>>>>>> 
>>>>>>> Comparing t_prio and mu_prio, this confirms that this task is indeed
>>>>>>> holding the mutex (no other task is waiting on the mutex).
>>>>>>> 
>>>>>>> What can happen that set mu_owner to 0? My original theory was that
>> if a
>>>>>>> mutex_pend was called from an interrupt context, mu_owner would be
>> 0. But
>>>>>>> in this case, the only task that is calling mutex is running an
>> eventq,
>>>>>> so
>>>>>>> that is unlikely.
>>>>>>> 
>>>>>>> Any ideas?
>>>>>>> 
>>>>>>> Jitesh
>>>>>>> 
>>>>>>> --
>>>>>>> This email including attachments contains Mad Apparel, Inc. DBA Athos
>>>>>>> privileged, confidential, and proprietary information solely for the
>> use
>>>>>>> for the addressed recipients. If you are not the intended recipient,
>>>>>> please
>>>>>>> be aware that any review, disclosure, copying, distribution, or use
>> of
>>>>>> the
>>>>>>> contents of this message is strictly prohibited. If you have received
>>>>>> this
>>>>>>> in error, please delete it immediately and notify the sender. All
>> rights
>>>>>>> reserved by Mad Apparel, Inc. 2012. The information contained herein
>> is
>>>>>> the
>>>>>>> exclusive property of Mad Apparel, Inc. and should not be used,
>>>>>>> distributed, reproduced, or disclosed in whole or in part without
>> prior
>>>>>>> written permission of Mad Apparel, Inc.
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> This email including attachments contains Mad Apparel, Inc. DBA Athos
>>>>> privileged, confidential, and proprietary information solely for the
>> use
>>>>> for the addressed recipients. If you are not the intended recipient,
>> please
>>>>> be aware that any review, disclosure, copying, distribution, or use of
>> the
>>>>> contents of this message is strictly prohibited. If you have received
>> this
>>>>> in error, please delete it immediately and notify the sender. All
>> rights
>>>>> reserved by Mad Apparel, Inc. 2012. The information contained herein
>> is the
>>>>> exclusive property of Mad Apparel, Inc. and should not be used,
>>>>> distributed, reproduced, or disclosed in whole or in part without prior
>>>>> written permission of Mad Apparel, Inc.
>>> 
>> 
>> 
> 
> -- 
> This email including attachments contains Mad Apparel, Inc. DBA Athos 
> privileged, confidential, and proprietary information solely for the use 
> for the addressed recipients. If you are not the intended recipient, please 
> be aware that any review, disclosure, copying, distribution, or use of the 
> contents of this message is strictly prohibited. If you have received this 
> in error, please delete it immediately and notify the sender. All rights 
> reserved by Mad Apparel, Inc. 2012. The information contained herein is the 
> exclusive property of Mad Apparel, Inc. and should not be used, 
> distributed, reproduced, or disclosed in whole or in part without prior 
> written permission of Mad Apparel, Inc.

Reply via email to