Re: [PATCH] Fix futures, %worker-count shouldn't be zero to prevent halting

Maxime Devos Wed, 20 Aug 2025 04:25:52 -0700

On 8/19/2025 1:39 AM, Nala Ginrut wrote:

To my understand, the current activity is to full fill the workerqueue first, then execute them one by one.However, if the worker count is zero, the fulfillment will alwaysreturn an empty queue, so that the existing futures can't be executed.

Don't see whatever 'the fulfillment' is supposed to be, that termdoesn't appear in the code.

Also, where did you get 'worker count is zero => always empty queue'from? This isn't supported by the code, futures are added to queueswhether there are workers or not, and futures don't disappear fromqueues out of nowhere.

Going through the code, there are two queues: the %futures queue, andthe %futures-waiting queue. Maybe you meant the second but it was reallynon-obvious. (or maybe you meant the first) Also, there is no queue ofworkers (there is a list of workers, but it is never actually usedexcept to check whether workers were created). So really, it is unclearwhat queue you are referring to.

Now, going through the code, there does appear to be an issue, and it isrelated to emptiness of %futures (detailed at the end), with as resultthat sometimes aren't executed, so if you replace 'always => sometimes','workers => futures" and move 'if the worker count is zero' to the end'(*) you are correct, but by then it's quite a different statement, andit still doesn't indicate the cause.

(*) e.g. "the queue is sometimes empty, with as a result that sometimesfutures can't be executed when the worker count is zero"

(On how 'worker count zero->always empty queue & no execution' doesn'tfollow) Let's go through it:


1. Make a future, in a situation where %worker-count is 0.
2. The initial state of the future is 'queued'
   
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n72)
3. This then is added to the queue:
   
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n113
   (so, no empty queue!)
4. (nothing happens yet, because there are no workers)
5. Touch the future.
6. Because the state of the future is not 'done' and not 'started', we
   are in the 'else' case:
   
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n254
7. So, (work) is performed.
8. The queue isn't empty (queues don't spontaneously lose their
   elements), so now (process-one-future) is performed.
9. In process-one-future, the queue still isn't empty.
10. It removed some future from the queue. Let's assume it's the future
   we are interested in (if not, things are done in a loop, so should
   get there eventually)
11. The future isn't done or started, so we are in the 'else' case
   
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n175)
12. It then is marked as 'started', and '(process-future! future)' is
   performed.
13. (process-future! ...) calls the thunk of future, and if completed,
   stores the result and returns  #true (and also signals the
   condition). In case of suspension, it marks the future as 'queued' again
14. (a) one option is that it is put back in the %futures queue (-> go
   to next iteration, in which maybe we'll hit the bug described later)
15. (b) the other option is that it is put in %futures-waiting (without
   being put in %futures)
16. When the waited-on future completes, it moves relevant futures from
   %futures-waiting to %futures

Note that none of the above steps so far depend on whether there areworkers or not (except the '_maybe_ hit the bug' side remark).

But, there appears to be an incorrect check. Look at the (eq? 'done ...)line in (work):


https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n235

|(define (work) ;; Do some work while waiting for FUTURE to complete.(lock-mutex %futures-mutex) (if (q-empty? %futures) (begin(unlock-mutex %futures-mutex) (with-mutex (future-mutex future)(unless (eq? 'done (future-state future)) (wait-condition-variable(future-completion future) (future-mutex future))))) (begin(process-one-future) (unlock-mutex %futures-mutex)))) |

The potential states are 'done', 'started' and 'queued'. The last caseis done wrong (it's ok with workers, but not in general)


 * if 'done', obviously it shouldn't wait. Pretty much finished!
   (handled correctly)

 * if 'started', some thread is performing the worker. So, we should
   wait until it is completed or suspended. First case:
   (futures-completion future), second case: (futures-mutex future) 
     (handled correctly)
 * if 'queued' and there are workers, then we can wait for a worker to
   pick it up (handled correctly)  (or we could do it ourself, but
   doesn't really matter)
 * if 'queued' and there are no workers, then we have to do it ourself!
   (handled _incorrectly_!!!)

Summary: done->no wait, started->wait, queued->no wait, last case nothandled generally correct.

So, (unless (eq? 'done (future-state future)) ...) needs to be changedto (when (eq? 'started (future-state future)) ...). Perhaps give that atry (instead of the %worker-count adjustment). (^^^)

I think the issue of the current implementation is to mix the twodifferent situations, thread-unsupported and single-CPU. The simplestfix is to make sure there is always a worker count.

No. You can't ensure there are always workers, because workers requirethread support, and threads aren't always supported. (Unless you intendto move fibers into guile or something like that.)

In status quo, '(A) multi-thread support, but zero workers' and '(B) nothread support, so zero workers' presumably behave the same (because%worker-count is the same in both). So, they presumably have the sameset of futures-related bugs. And while (A) can be modified to ensure >=1workers, this can't be done for (B). So, if there is a bug, the bugremains for (B) after the 'adjust %worker-count patch'.


As mentioned previously:

Also, %worker-count can still be 0 now, in case no thread support is
available. So, when no thread support is available, the same would
presumably happen. So if it is a bug in (ice-9 futures), it is not clear
that it is actually fixed.

Also, the other change (^^^) is quite simple too, and (assuming I didn'tmiss something) works in general, not only in the 'has thread support'situation (unlike the %worker-count adjustment patch), and unlike yourpatch has no change in semantics (well, except in the sense that thatbugs are unintended semantics). So (assuming no mistakes), (^^^) is abit simpler (no semantics change), (^^^) handles more cases and (^^^)addresses the root of the issue.


Best regards,
Maxime Devos

Re: [PATCH] Fix futures, %worker-count shouldn't be zero to prevent halting

Reply via email to