[... AI-136 ...] You mentioned glibc... <chuckles> ask Ulrich Drepper:
https://www.opengroup.org/austin/aardvark/latest/xshbug2.txt "_____________________________________________________________________________ OBJECTION Enhancement Request Number 207 drepper:xxxxxxxxxx Defect in XSH Threads (rdvk# 2) {ud-cancel-timeout} Fri, 25 May 2007 20:55:51 +0100 (BST) _____________________________________________________________________________ Accept_X___ Accept as marked below_____ Duplicate_____ Reject_____ Rationale for rejected or partial changes: ERN 207 (->interps) AI-136" Good luck. :-) regards, alexander. -----Dimitri Staessens <dimitri.staess...@ugent.be> wrote: ----- To: Alexander Terekhov <terek...@de.ibm.com> From: Dimitri Staessens <dimitri.staess...@ugent.be> Date: 06/14/2017 01:20PM Cc: austin-group-l@opengroup.org Subject: Re: Fwd: Re: request for clarification on Open Group Base Specifications Issue 7: Canc... Hi Alexander, I'm just trying to understand. This single change: https://collaboration.opengroup.org/austin/interps/documents/14359/AI-136.txt has a significant impact on the API. It impacts very important calls such as select(), pthread_cond_timedwait() and many others. Previous to the change, every cancellation point, whether it has a timeout or not, had to cancel the thread if there was a pending cancellation request before the cancellation point was called. If there was a cancellation request when a cancellation point was already being executed (the thread that needs to be cancelled was suspended at the cancellation point (either by the scheduler or because it was waiting on the condition or a timeout or whatever), it was up to the implementation to choose whether it cancels the thread or returns and defers the cancellation to the next cancellation point. This is to make implementations more efficient. In other words, prior to AI-136, this code would always cancel a thread eventually: while (1) { /* some code */ cancellation_point(); /* some more code */ } regardless of the input on the function that was defined as a cancellation point. It may not do it immediately, it may do one more loop if the thread was executing cancellation_point() when the cancellation request came. After the change, this is not the case anymore. The above code might defer the cancellation ad infinitum. To achieve the same result, cancellation points with a timeout would all need an extra clause: while (1) { /* some code */ if (cancellation_point() == ETIMEDOUT) pthread_testcancel(); /* some more code */ } The actual execution depends on the implementation of the cancellation point, since it's "undefined behaviour" whether it returns timeout or cancels. This extra check may negatively impact performance if the intention of the call with an expired timeout is to perform a "poll" (e.g. a select with a 0 timeout). Is it the intention of the change in AI-136 that every POSIX-compliant program ever written now implements the change above? If this is the intention, can you explain to me why? It changes previously defined behaviour into undefined behaviour. cheers, Dimitri On 06/14/17 13:33, Alexander Terekhov wrote: What is "effectively in a waiting state"? A thread does not have to be blocked "waiting" for an event... And pthread_testcancel() won't help in the case of 'tomorrow' events and blocking further execution while waiting for an event. Feel free to file a bug report and I'll request explicit opposite change regarding "shall occur" list and "an event that a thread is waiting for has occurred". regards, alexander. From: Dimitri Staessens <dimitri.staess...@ugent.be> To: austin-group-l@opengroup.org Date: 14.06.2017 11:32 Subject: Fwd: Re: request for clarification on Open Group Base Specifications Issue 7: Canc... The same document also states: Cancellation points are points inside of certain functions where a thread has to act on any pending cancellation request when cancelability is enabled. For functions in the "shall occur" list, a cancellation check must be performed on every call regardless of whether, absent the cancellation, the call would have blocked. The phrase you mention has a clause "an event that a thread is waiting for has occurred", which implies the thread is effectively in a waiting state. It might have performed the cancellation check before that, and that's specifically allowed. But it must perform the check. What is not to be allowed is that a function that is defined as a cancellation point doesn't perform a check at all before returning. Because if that cancellation point is the only one, it may make the thread uncancellable depending on input on which the implementor may have no control over whatsoever. Having to perform a check on the result (that may or may not occur) and calling pthread_testcancel() is not a consistent API. In that case, it would be better to define the function as not being a cancellation point at all and then people would have to implement a check and a pthread_testcancel(). On 06/14/17 11:49, Alexander Terekhov wrote: http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_09_24 "Due to the nature of cancellation, there is generally no synchronization between the thread requesting the cancellation of a blocked thread and events that may cause that thread to resume execution. For this reason, and because excess serialization hurts performance, when both an event that a thread is waiting for has occurred and a cancellation request has been made and cancellation is enabled, POSIX.1-2008 explicitly allows the implementation to choose between returning from the blocking call or acting on the cancellation request." I think that the above makes sense also for the events that happened 'yesterday' (e.g. absolutely expired timeouts) when a' blocking call' does not really 'block' in a metaphysical sense and only reports an occurrence of an event. regards, alexander. From: Dimitri Staessens <dimitri.staess...@ugent.be> To: shwares...@aol.com, austin-group-l@opengroup.org Date: 14.06.2017 08:33 Subject: Re: request for clarification on Open Group Base Specifications Issue 7: Canc... On 06/14/17 01:18, SHwareSyst@aol.comwrote: While that is the link for general consumption, it's also available at: https://www.opengroup.org/austin/login.html along with other versions and support documents. Thanks! As to the text, when a preemptive scheduler is being used there's a remote chance an interface can perform the cancellation check on entry and see none are pending, yet lose its time slice on the statement after the check and be blocked that way. In normal use this is the primary reason a thread will block, as the time slices will be small in duration to give an appearance of parallelism. An awakened thread can then issue a cancel request, but this does not have to be processed by the blocked thread until the next cancellation point after it's awakened again. This applies to all interfaces in the "shall occur" list, and "may occur" one, along with other reasons to block such as timers or waiting on a device for read(), in how they're phrased. I fully agree. But if the cancellation request was pending before the call to the cancellation point, cancellation points in the "shall occur" list have to check the cancellation prior to returning. I'll give it some thought how to phrase this with the least chance for misinterpretation. They're not supposed to assume threads block only for those secondary reasons, as occurs with a non-preemptive scheduler that requires threads to use sched_yield() to allow another thread to resume. Because of this an interface may block at least twice during the same call to it, first due to losing time slice and then interface specific reasons. With really small time slices the time needed to release resources such as mutexes can cause additional time slice expiration blocks after the interface specific block too. In a message dated 6/13/2017 11:51:50 A.M. Eastern Daylight Time, dimitri.staessens@ugent.bewrites: Can someone confirm that this is the correct full version of the specification to reference if I file a bug report? http://ieeexplore.ieee.org/document/7582338/ On 06/13/17 18:05, Dimitri Staessens wrote: So apparently this change was somehow intended to allow the case. But it is nevertheless wrong since it contradicts the following statement: "For functions in the "shall occur" list, a cancellation check must be performed on every call regardless of whether, absent the cancellation, the call would have blocked." So even if there is a timeout, those functions have to check cancellation and the behaviour is thus not undefined. The behaviour may only be undefined if the cancellation point already performed the check and is now suspended. I'd like to file a bug report and propose a change to the specification to fix this. I'm new to this group and saw that the bug reports reference page numbers. Can someone point me to where I can find the official document so I can make the correct references? Thanks, Dimitri On 06/13/17 15:13, Dimitri Staessens wrote: Hi Geoff, Awesome service, thanks! Dimitri On 06/13/17 13:35, Geoff Clare wrote: Dimitri Staessens <dimitri.staess...@ugent.be>wrote, on 12 Jun 2017: Is there a way for me to track down the people that are responsible for this adjustment in the specification so that they can comment on their intentions and motivations for making it? https://collaboration.opengroup.org/austin/interps/documents/14359/AI-136.txt