Looks ok to me. /Staffan
On 3 jun 2014, at 15:49, Aleksej Efimov <aleksej.efi...@oracle.com> wrote: > Staffan, David, > > Returning back to our WaitForMultipleObjects()/WAIT_ABANDONED discussions: > Thank you for your comments and remarks. I can't disagree with motivation > that it's better to have a fatal error during the incorrect mutex handling > then data corruption (the consequence of the previous fix). In case of such > error it'll be much more easier to debug/catch it (but as Staffan said - we > have tried to check all call paths and don't think that we'll encounter such > behavior). > I have modified the discussed code according to your suggestions: > http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.01/ > To abort the process the 'exitTransportWithError' function was utilized. > Also I have tried to check that behavior isn't changed by running "svc" > regression tests set. There was no related test failures observed. > > Best Regards, > Aleksej > > On 05/15/2014 01:11 PM, Staffan Larsen wrote: >> On 15 maj 2014, at 03:48, David Holmes <david.hol...@oracle.com> wrote: >> >>> On 14/05/2014 11:18 PM, Aleksej Efimov wrote: >>>> David, Vitaly, >>>> >>>> I totally agree with Vitaly's explanation (Vitaly - thank you for that) >>>> and code in shmemBase.c (the usage of enterMutex() function for >>>> sending/receiving bytes through shared memory connection) illustrates on >>>> how the connection shutdown event is used as a "cancellation token". >>> Thanks for clarifying that. >>> >>> So if we were to encounter an abandoned mutex the code would presently have >>> acquired the mutex but return an error, thus preventing a subsequent >>> release, and preventing other threads from acquiring (but allowing current >>> thread to recurisevely acquire. So this could both hang and cause data >>> corruption. >>> >>> The new code will still return an error but release the mutex. So no more >>> hangs (other than by conditions caused by data corruption) but more >>> opportunity for data corruption. >>> >>> Obviously it depends on exactly what happens in the critical sections >>> guarded by this mutex, but in general I don't agree with this rationale for >>> making the change: >>> >>> 204 /* If the mutex is abandoned we want to return an error >>> 205 * and also release the mutex so that we don't cause >>> 206 * other threads to be blocked. If a mutex was abandoned >>> 207 * we are in scary state. Data may be corrupted or inconsistent >>> 208 * but it is still better to let other threads run (and possibly >>> 209 * crash) than having them blocked (on the premise that a crash >>> 210 * is always easier to debug than a hang). >>> >>> Considering something has to have gone drastically wrong for the mutex to >>> become abandoned, I'm more inclined to consider this a fatal error and >>> abort. >>> >>> But I'll let the serviceability folk chime in here. >> I was involved in fixing this and writing the comment, so obviously I >> thought it a good solution :-) >> >> I do agree that it would probably be a good idea to consider this a fatal >> error and abort. At that point in the code we don’t have any really nice >> ways of doing that, though. We could just print an error and call abort(). >> What we are doing now is returning an error from sysIPMutexEnter() and >> delegating the error handling to the caller. We have tried to check all call >> paths to verify that they do “the right thing” in the face of an error. It >> is obviously hard to verify, but it looks like they all terminate the >> connection with some kind of error message. >> >> /Staffan >> >> >>> Thanks, >>> David >>> >>> >>>> Thank you, >>>> -Aleksej >>>> >>>> >>>> On 05/14/2014 01:05 PM, David Holmes wrote: >>>>> On 14/05/2014 11:06 AM, Vitaly Davidovich wrote: >>>>>> In windows, you acquire a mutex by waiting on it using one of the wait >>>>>> functions, one of them employed in the code in question. If >>>>>> WaitForMultipleObjects succeeds and returns the index of the mutex, >>>>>> current thread has ownership now. >>>>> Yes I understand the basic mechanics :) >>>>> >>>>>> It's also common to use multi wait functions where the event is a >>>>>> "cancelation token", e.g. manual reset event; this allows someone to >>>>>> cancel waiting on mutex acquisition and return from the wait function. >>>>>> Presumably that's the case here, but I'll let Aleksej confirm; just >>>>>> wanted to throw this out there in the meantime :). >>>>> Ah I see - yes cancellable lock acquisition would make sense. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Sent from my phone >>>>>> >>>>>> On May 13, 2014 6:46 PM, "David Holmes" <david.hol...@oracle.com >>>>>> <mailto:david.hol...@oracle.com>> wrote: >>>>>> >>>>>> Hi Aleksej, >>>>>> >>>>>> Thanks for the doc references regarding abandonment. >>>>>> >>>>>> Let me rephrase my question. What is this logic trying to achieve by >>>>>> waiting on both a mutex and an event? Do we already own the mutex >>>>>> when this function is called? >>>>>> >>>>>> David >>>>>> >>>>>> On 13/05/2014 11:19 PM, Aleksej Efimov wrote: >>>>>> >>>>>> David, >>>>>> >>>>>> The Windows has a different terminology for mutex objects (much >>>>>> differs >>>>>> from the POSIX one). This one link gave me some understanding of >>>>>> it [1]. >>>>>> >>>>>> Here is the MSDN [1] description of what "abandoned mutex" is: >>>>>> " If a thread terminates without releasing its ownership of a >>>>>> mutex >>>>>> object, the mutex object is considered to be abandoned. A >>>>>> waiting thread >>>>>> can acquire ownership of an abandoned mutex object, but the wait >>>>>> function will return*WAIT_ABANDONED*to indicate that the mutex >>>>>> object is >>>>>> abandoned. An abandoned mutex object indicates that an error has >>>>>> occurred and that any shared resource being protected by the >>>>>> mutex >>>>>> object is in an undefined state. If the thread proceeds as >>>>>> though the >>>>>> mutex object had not been abandoned, it is no longer considered >>>>>> abandoned after the thread releases its ownership. This restores >>>>>> normal >>>>>> behavior if a handle to the mutex object is subsequently >>>>>> specified in a >>>>>> wait function." >>>>>> >>>>>> >>>>>> What does it mean to wait on mutex and ownership of the mutex >>>>>> object: >>>>>> "Any thread with a handle to a mutex object can use one of >>>>>> thewait >>>>>> functions >>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>to >>>>>> request ownership of the mutex object. If the mutex object is >>>>>> owned by >>>>>> another thread, the wait function blocks the requesting thread >>>>>> until the >>>>>> owning thread releases the mutex object using the*ReleaseMutex* >>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__function." >>>>>> >>>>>> How we can release mutex and wait on already owned mutex: >>>>>> " After a thread obtains ownership of a mutex, it can specify >>>>>> the same >>>>>> mutex in repeated calls to the wait-functions >>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>__without >>>>>> blocking its execution. This prevents a thread from deadlocking >>>>>> itself >>>>>> while waiting for a mutex that it already owns. To release its >>>>>> ownership >>>>>> under such circumstances, the thread must call*ReleaseMutex* >>>>>> <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__once >>>>>> for each time that the mutex satisfied the conditions of a wait >>>>>> function." >>>>>> >>>>>> [1] >>>>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx> >>>>>> >>>>>> -Aleksej >>>>>> >>>>>> On 05/13/2014 04:00 PM, David Holmes wrote: >>>>>> >>>>>> I don't understand this one at all. What is an "abandoned >>>>>> mutex"? For >>>>>> that matter why does the code wait on a mutex and an event? >>>>>> Do we >>>>>> already own the mutex? If so what does it mean to wait on >>>>>> it? If not >>>>>> then how can we release it? >>>>>> >>>>>> ??? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> On 13/05/2014 8:57 PM, Alan Bateman wrote: >>>>>> >>>>>> >>>>>> This is debugger's shared memory transport so cc'ing >>>>>> serviceability-dev >>>>>> as that is there this code is maintained. >>>>>> >>>>>> Is there a test case or any outline of the conditions >>>>>> that cause this? I >>>>>> think that would be useful to understand the issue >>>>>> further. >>>>>> >>>>>> -Alan >>>>>> >>>>>> On 13/05/2014 11:46, Aleksej Efimov wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can I have a review for 8032901 bug [1] fix [2]. >>>>>> There is a possible >>>>>> case when 'WaitForMultipleObjects' function can >>>>>> return the >>>>>> WAIT_ABANDONED_0 [3] error value. >>>>>> In such case it's better to release the mutex and >>>>>> return error value. >>>>>> This will prevent other threads to be blocked on >>>>>> abandoned mutex. >>>>>> >>>>>> Thank you, >>>>>> Aleksej >>>>>> >>>>>> [1] >>>>>> https://bugs.openjdk.java.net/__browse/JDK-8032901 >>>>>> <https://bugs.openjdk.java.net/browse/JDK-8032901> >>>>>> [2] >>>>>> http://cr.openjdk.java.net/~__aefimov/8032901/9/webrev.00/ >>>>>> <http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/> >>>>>> [3] >>>>>> http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687025(v=vs.85).aspx >>>>>> <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx> >>>>>> >>>>>> >>>>>> >>>>>> >