On 8/19/2025 1:39 AM, Nala Ginrut wrote:
To my understand, the current activity is to full fill the worker
queue first, then execute them one by one.
However, if the worker count is zero, the fulfillment will always
return an empty queue, so that the existing futures can't be executed.
Don't see whatever 'the fulfillment' is supposed to be, that term
doesn't appear in the code.
Also, where did you get 'worker count is zero => always empty queue'
from? This isn't supported by the code, futures are added to queues
whether there are workers or not, and futures don't disappear from
queues out of nowhere.
Going through the code, there are two queues: the %futures queue, and
the %futures-waiting queue. Maybe you meant the second but it was really
non-obvious. (or maybe you meant the first) Also, there is no queue of
workers (there is a list of workers, but it is never actually used
except to check whether workers were created). So really, it is unclear
what queue you are referring to.
Now, going through the code, there does appear to be an issue, and it is
related to emptiness of %futures (detailed at the end), with as result
that sometimes aren't executed, so if you replace 'always => sometimes',
'workers => futures" and move 'if the worker count is zero' to the end'
(*) you are correct, but by then it's quite a different statement, and
it still doesn't indicate the cause.
(*) e.g. "the queue is sometimes empty, with as a result that sometimes
futures can't be executed when the worker count is zero"
(On how 'worker count zero->always empty queue & no execution' doesn't
follow) Let's go through it:
1. Make a future, in a situation where %worker-count is 0.
2. The initial state of the future is 'queued'
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n72)
3. This then is added to the queue:
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n113
(so, no empty queue!)
4. (nothing happens yet, because there are no workers)
5. Touch the future.
6. Because the state of the future is not 'done' and not 'started', we
are in the 'else' case:
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n254
7. So, (work) is performed.
8. The queue isn't empty (queues don't spontaneously lose their
elements), so now (process-one-future) is performed.
9. In process-one-future, the queue still isn't empty.
10. It removed some future from the queue. Let's assume it's the future
we are interested in (if not, things are done in a loop, so should
get there eventually)
11. The future isn't done or started, so we are in the 'else' case
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n175)
12. It then is marked as 'started', and '(process-future! future)' is
performed.
13. (process-future! ...) calls the thunk of future, and if completed,
stores the result and returns #true (and also signals the
condition). In case of suspension, it marks the future as 'queued' again
14. (a) one option is that it is put back in the %futures queue (-> go
to next iteration, in which maybe we'll hit the bug described later)
15. (b) the other option is that it is put in %futures-waiting (without
being put in %futures)
16. When the waited-on future completes, it moves relevant futures from
%futures-waiting to %futures
Note that none of the above steps so far depend on whether there are
workers or not (except the '_maybe_ hit the bug' side remark).
But, there appears to be an incorrect check. Look at the (eq? 'done ...)
line in (work):
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n235
|(define (work) ;; Do some work while waiting for FUTURE to complete.
(lock-mutex %futures-mutex) (if (q-empty? %futures) (begin
(unlock-mutex %futures-mutex) (with-mutex (future-mutex future)
(unless (eq? 'done (future-state future)) (wait-condition-variable
(future-completion future) (future-mutex future))))) (begin
(process-one-future) (unlock-mutex %futures-mutex)))) |
The potential states are 'done', 'started' and 'queued'. The last case
is done wrong (it's ok with workers, but not in general)
* if 'done', obviously it shouldn't wait. Pretty much finished!
(handled correctly)
* if 'started', some thread is performing the worker. So, we should
wait until it is completed or suspended. First case:
(futures-completion future), second case: (futures-mutex future)
(handled correctly)
* if 'queued' and there are workers, then we can wait for a worker to
pick it up (handled correctly) (or we could do it ourself, but
doesn't really matter)
* if 'queued' and there are no workers, then we have to do it ourself!
(handled _incorrectly_!!!)
Summary: done->no wait, started->wait, queued->no wait, last case not
handled generally correct.
So, (unless (eq? 'done (future-state future)) ...) needs to be changed
to (when (eq? 'started (future-state future)) ...). Perhaps give that a
try (instead of the %worker-count adjustment). (^^^)
I think the issue of the current implementation is to mix the two
different situations, thread-unsupported and single-CPU. The simplest
fix is to make sure there is always a worker count.
No. You can't ensure there are always workers, because workers require
thread support, and threads aren't always supported. (Unless you intend
to move fibers into guile or something like that.)
In status quo, '(A) multi-thread support, but zero workers' and '(B) no
thread support, so zero workers' presumably behave the same (because
%worker-count is the same in both). So, they presumably have the same
set of futures-related bugs. And while (A) can be modified to ensure >=1
workers, this can't be done for (B). So, if there is a bug, the bug
remains for (B) after the 'adjust %worker-count patch'.
As mentioned previously:
Also, %worker-count can still be 0 now, in case no thread support is
available. So, when no thread support is available, the same would
presumably happen. So if it is a bug in (ice-9 futures), it is not clear
that it is actually fixed.
Also, the other change (^^^) is quite simple too, and (assuming I didn't
miss something) works in general, not only in the 'has thread support'
situation (unlike the %worker-count adjustment patch), and unlike your
patch has no change in semantics (well, except in the sense that that
bugs are unintended semantics). So (assuming no mistakes), (^^^) is a
bit simpler (no semantics change), (^^^) handles more cases and (^^^)
addresses the root of the issue.
Best regards,
Maxime Devos