On 8/19/2025 1:39 AM, Nala Ginrut wrote:

To my understand, the current activity is to full fill the worker queue first, then execute them one by one. However, if the worker count is zero, the fulfillment will always return an empty queue, so that the existing futures can't be executed.

Don't see whatever 'the fulfillment' is supposed to be, that term doesn't appear in the code.

Also, where did you get 'worker count is zero => always empty queue' from? This isn't supported by the code, futures are added to queues whether there are workers or not, and futures don't disappear from queues out of nowhere.

Going through the code, there are two queues: the %futures queue, and the %futures-waiting queue. Maybe you meant the second but it was really non-obvious. (or maybe you meant the first) Also, there is no queue of workers (there is a list of workers, but it is never actually used except to check whether workers were created). So really, it is unclear what queue you are referring to.

Now, going through the code, there does appear to be an issue, and it is related to emptiness of %futures (detailed at the end), with as result that sometimes aren't executed, so if you replace 'always => sometimes', 'workers => futures" and move 'if the worker count is zero' to the end' (*) you are correct, but by then it's quite a different statement, and it still doesn't indicate the cause.

(*) e.g. "the queue is sometimes empty, with as a result that sometimes futures can't be executed when the worker count is zero"

(On how 'worker count zero->always empty queue & no execution' doesn't follow) Let's go through it:

1. Make a future, in a situation where %worker-count is 0.
2. The initial state of the future is 'queued'
   
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n72)
3. This then is added to the queue:
   
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n113
   (so, no empty queue!)
4. (nothing happens yet, because there are no workers)
5. Touch the future.
6. Because the state of the future is not 'done' and not 'started', we
   are in the 'else' case:
   
https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n254
7. So, (work) is performed.
8. The queue isn't empty (queues don't spontaneously lose their
   elements), so now (process-one-future) is performed.
9. In process-one-future, the queue still isn't empty.
10. It removed some future from the queue. Let's assume it's the future
   we are interested in (if not, things are done in a loop, so should
   get there eventually)
11. The future isn't done or started, so we are in the 'else' case
   
(https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n175)
12. It then is marked as 'started', and '(process-future! future)' is
   performed.
13. (process-future! ...) calls the thunk of future, and if completed,
   stores the result and returns  #true (and also signals the
   condition). In case of suspension, it marks the future as 'queued' again
14. (a) one option is that it is put back in the %futures queue (-> go
   to next iteration, in which maybe we'll hit the bug described later)
15. (b) the other option is that it is put in %futures-waiting (without
   being put in %futures)
16. When the waited-on future completes, it moves relevant futures from
   %futures-waiting to %futures

Note that none of the above steps so far depend on whether there are workers or not (except the '_maybe_ hit the bug' side remark).

But, there appears to be an incorrect check. Look at the (eq? 'done ...) line in (work):

https://cgit.git.savannah.gnu.org/cgit/guile.git/tree/module/ice-9/futures.scm#n235

|(define (work) ;; Do some work while waiting for FUTURE to complete. (lock-mutex %futures-mutex) (if (q-empty? %futures) (begin (unlock-mutex %futures-mutex) (with-mutex (future-mutex future) (unless (eq? 'done (future-state future)) (wait-condition-variable (future-completion future) (future-mutex future))))) (begin (process-one-future) (unlock-mutex %futures-mutex)))) |

The potential states are 'done', 'started' and 'queued'. The last case is done wrong (it's ok with workers, but not in general)

 * if 'done', obviously it shouldn't wait. Pretty much finished!
   (handled correctly)

 * if 'started', some thread is performing the worker. So, we should
   wait until it is completed or suspended. First case:
   (futures-completion future), second case: (futures-mutex future) 
     (handled correctly)
 * if 'queued' and there are workers, then we can wait for a worker to
   pick it up (handled correctly)  (or we could do it ourself, but
   doesn't really matter)
 * if 'queued' and there are no workers, then we have to do it ourself!
   (handled _incorrectly_!!!)

Summary: done->no wait, started->wait, queued->no wait, last case not handled generally correct.

So, (unless (eq? 'done (future-state future)) ...) needs to be changed to (when (eq? 'started (future-state future)) ...). Perhaps give that a try (instead of the %worker-count adjustment).   (^^^)

I think the issue of the current implementation is to mix the two different situations, thread-unsupported and single-CPU. The simplest fix is to make sure there is always a worker count.

No. You can't ensure there are always workers, because workers require thread support, and threads aren't always supported. (Unless you intend to move fibers into guile or something like that.)

In status quo, '(A) multi-thread support, but zero workers' and '(B) no thread support, so zero workers' presumably behave the same (because %worker-count is the same in both). So, they presumably have the same set of futures-related bugs. And while (A) can be modified to ensure >=1 workers, this can't be done for (B). So, if there is a bug, the bug remains for (B) after the 'adjust %worker-count patch'.

As mentioned previously:

Also, %worker-count can still be 0 now, in case no thread support is
available. So, when no thread support is available, the same would
presumably happen. So if it is a bug in (ice-9 futures), it is not clear
that it is actually fixed.

Also, the other change (^^^) is quite simple too, and (assuming I didn't miss something) works in general, not only in the 'has thread support' situation (unlike the %worker-count adjustment patch), and unlike your patch has no change in semantics (well, except in the sense that that bugs are unintended semantics).  So (assuming no mistakes), (^^^) is a bit simpler (no semantics change), (^^^) handles more cases and (^^^) addresses the root of the issue.

Best regards,
Maxime Devos
  • [PATCH] Fix futur... Nala Ginrut
    • Re: [PATCH] ... Maxime Devos
      • Re: [PAT... Nala Ginrut
        • Re: ... Maxime Devos
          • ... Nala Ginrut
            • ... Maxime Devos
              • ... Nala Ginrut
                • ... Maxime Devos
                • ... Nala Ginrut
                • ... Developers list for Guile, the GNU extensibility library
                • ... Maxime Devos
                • ... Developers list for Guile, the GNU extensibility library
                • ... Maxime Devos
                • ... Developers list for Guile, the GNU extensibility library
                • ... Maxime Devos
                • ... Developers list for Guile, the GNU extensibility library

Reply via email to