Re: Fwd: Re: request for clarification on Open Group Base Specifications Issue 7: Canc...

Alexander Terekhov Wed, 14 Jun 2017 05:27:08 -0700

[... AI-136 ...]

You mentioned glibc... <chuckles> ask Ulrich Drepper:


https://www.opengroup.org/austin/aardvark/latest/xshbug2.txt

"_____________________________________________________________________________
OBJECTION                                      Enhancement Request Number 207
drepper:xxxxxxxxxx                           Defect in XSH Threads (rdvk#  2)
{ud-cancel-timeout}                     Fri, 25 May 2007 20:55:51 +0100 (BST)
_____________________________________________________________________________
Accept_X___    Accept as marked below_____     Duplicate_____     Reject_____
Rationale for rejected or partial changes:

ERN 207 (->interps) AI-136"

Good luck. :-)

regards,
alexander.

-----Dimitri Staessens <dimitri.staess...@ugent.be> wrote: -----
To: Alexander Terekhov <terek...@de.ibm.com>
From: Dimitri Staessens <dimitri.staess...@ugent.be>
Date: 06/14/2017 01:20PM
Cc: austin-group-l@opengroup.org
Subject: Re: Fwd: Re: request for clarification on Open Group Base 
Specifications Issue 7: Canc...

Hi Alexander,
I'm just trying to understand. 
This single change:
https://collaboration.opengroup.org/austin/interps/documents/14359/AI-136.txt
has a significant impact on the API. It impacts very important calls such as 
select(), pthread_cond_timedwait() and many others. Previous to the change, 
every cancellation point, whether it has a timeout or not, had to cancel the 
thread if there was a pending cancellation request before the cancellation 
point was called. 
If there was a cancellation request when a cancellation point was already being 
executed (the thread that needs to be cancelled was suspended at the 
cancellation point (either by the scheduler or because it was waiting on the 
condition or a timeout or whatever), it was up to the implementation to choose 
whether it cancels the thread or returns and defers the cancellation to the 
next cancellation point. This is to make implementations more efficient.
In other words, prior to AI-136, this code would always cancel a thread 
eventually:

        while (1) {
                /* some code */
                cancellation_point();
                /* some more code */
        } 
regardless of the input on the function that was defined as a cancellation 
point. It may not do it immediately, it may do one more loop if the thread was 
executing cancellation_point() when the cancellation request came.
After the change, this is not the case anymore. The above code might defer the 
cancellation ad infinitum. 
To achieve the same result, cancellation points with a timeout would all need 
an extra clause:
        while (1) {
                /* some code */
                if (cancellation_point() == ETIMEDOUT)
                           pthread_testcancel();
                /* some more code */
        } 
The actual execution depends on the implementation of the cancellation point, 
since it's "undefined behaviour" whether it returns timeout or cancels. This 
extra check may negatively impact performance if the intention of the call with 
an expired timeout is to perform a "poll" (e.g. a select with a 0 timeout).
Is it the intention of the change in AI-136 that every POSIX-compliant program 
ever written now implements the change above? If this is the intention, can you 
explain to me why? It changes previously defined behaviour into undefined 
behaviour.
cheers,
Dimitri

On 06/14/17 13:33, Alexander Terekhov wrote:
What is "effectively in a waiting state"?

A thread does not have to be blocked "waiting" for an event...

And pthread_testcancel() won't help in the case of 'tomorrow' events and 
blocking further execution while waiting for an event.

Feel free to file a bug report and I'll request explicit opposite change 
regarding "shall occur" list and "an event that a thread is waiting for has 
occurred".

regards,
alexander.





From:        Dimitri Staessens <dimitri.staess...@ugent.be>
To:        austin-group-l@opengroup.org
Date:        14.06.2017 11:32
Subject:        Fwd: Re: request for clarification on Open Group Base 
Specifications Issue 7: Canc...




The same document also states: 
Cancellation points are points inside of certain functions where a thread has 
to act on any pending cancellation request when cancelability is enabled. For 
functions in the "shall occur" list, a cancellation check must be performed on 
every call regardless of whether, absent the cancellation, the call would have 
blocked.
The phrase you mention has a clause "an event that a thread is waiting for has 
occurred", which implies the thread is effectively in a waiting state. It might 
have performed the cancellation check before that, and that's specifically 
allowed. But it must perform the check.
What is not to be allowed is that a function that is defined as a cancellation 
point doesn't perform a check at all before returning. Because if that 
cancellation point is the only one, it may make the thread uncancellable 
depending on input on which the implementor may have no control over 
whatsoever. Having to perform a check on the result (that may or may not occur) 
and calling pthread_testcancel() is not a consistent API. In that case, it 
would be better to define the function as not being a cancellation point at all 
and then people would have to implement a check and a pthread_testcancel().  
On 06/14/17 11:49, Alexander Terekhov wrote:
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_09_24

"Due to the nature of cancellation, there is generally no synchronization 
between the thread requesting the cancellation of a blocked thread and events 
that may cause that thread to resume execution. For this reason, and because 
excess serialization hurts performance, when both an event that a thread is 
waiting for has occurred and a cancellation request has been made and 
cancellation is enabled, POSIX.1-2008 explicitly allows the implementation to 
choose between returning from the blocking call or acting on the cancellation 
request."

I think that the above makes sense also for the events that happened 
'yesterday' (e.g. absolutely expired timeouts) when a' blocking call' does not 
really 'block' in a metaphysical sense and only reports an occurrence of an 
event.

regards,
alexander.



From:        Dimitri Staessens <dimitri.staess...@ugent.be>
To:        shwares...@aol.com, austin-group-l@opengroup.org
Date:        14.06.2017 08:33
Subject:        Re: request for clarification on Open Group Base Specifications 
Issue 7: Canc...




On 06/14/17 01:18, SHwareSyst@aol.comwrote:
While that is the link for general consumption, it's also available at:
https://www.opengroup.org/austin/login.html
along with other versions and support documents.
Thanks!
 
As to the text, when a preemptive scheduler is being used there's a 
remote chance an interface can perform the cancellation check on 
entry and see none are pending, yet lose its time slice on the 
statement after the check and be blocked that way. In normal use 
this is the primary reason a thread will block, as the time slices will 
be small in duration to give an appearance of parallelism. 
An awakened thread can then issue a cancel request, but this does 
not have to be processed by the blocked thread until the next 
cancellation point after it's awakened again. This applies to all 
interfaces in the "shall occur" list, and "may occur" one, along with 
other reasons to block such as timers or waiting on a device for 
read(), in how they're phrased. 
I fully agree. But if the cancellation request was pending before the call 
to the cancellation point, cancellation points in the "shall occur" list have 
to check the cancellation prior to returning. I'll give it some thought how 
to phrase this with the least chance for misinterpretation.
They're not supposed to assume 
threads block only for those secondary reasons, as occurs with a 
non-preemptive scheduler that requires threads to use 
sched_yield() to allow another thread to resume.
 
Because of this an interface may block at least twice during the 
same call to it, first due to losing time slice and then interface 
specific reasons. With really small time slices the time needed 
to release resources such as mutexes can cause additional time 
slice expiration blocks after the interface specific block too.
 
In a message dated 6/13/2017 11:51:50 A.M. Eastern Daylight Time, 
dimitri.staessens@ugent.bewrites: 
Can someone confirm that this is the correct full version of the specification 
to reference if I file a bug report?
http://ieeexplore.ieee.org/document/7582338/
On 06/13/17 18:05, Dimitri Staessens wrote:
So apparently this change was somehow intended to allow the case. But it is 
nevertheless wrong since it contradicts the following statement:
"For functions in the "shall occur" list, a cancellation check must be 
performed on every call regardless of whether, absent the cancellation, the 
call would have blocked."

So even if there is a timeout, those functions have to check cancellation and 
the behaviour is thus not undefined. The behaviour may only be undefined if the 
cancellation point already performed the check and is now suspended.

I'd like to file a bug report and propose a change to the specification to fix 
this. I'm new to this group and saw that the bug reports reference page 
numbers. Can someone point me to where I can find the official document so I 
can make the correct references?

Thanks,

Dimitri


On 06/13/17 15:13, Dimitri Staessens wrote:
Hi Geoff,

Awesome service, thanks!

Dimitri

On 06/13/17 13:35, Geoff Clare wrote:

Dimitri Staessens <dimitri.staess...@ugent.be>wrote, on 12 Jun 2017:

Is there a way for me to track down the people that are responsible for
this adjustment in the specification so that they can comment on their
intentions and motivations for making it?

https://collaboration.opengroup.org/austin/interps/documents/14359/AI-136.txt

Re: Fwd: Re: request for clarification on Open Group Base Specifications Issue 7: Canc...

Reply via email to