I have just spend some time reading through all the discussions and the
new "threads" document and I would like to propose the addition of a new library function.
forkOS :: IO () -> IO ThreadID
Something like that is already in the proposal, only it's currently called forkBoundThread and it doesn't return the ThreadID (that can be changed, though).
With this, I also propose that "forkIO" always runs a Haskell thread in the same OS thread that the current Haskell thread runs in.
(i.e. "forkIO": same OS thread, "forkOS": new OS thread)
In the proposal we wrote:
"The specification shouldn’t explicitly require lightweight “green” threads
to exist. The specification should be implementable in a simple and obvious
way in haskell systems that always use a 1:1 correspondence between
Haskell threads and OS threads."
The idea was that lightweight ("green") threads are an optimization only (do they have any other advantage?), not a language feature, and that implementations of Haskell should not be forced to support a complex thread management system.
Your proposal obviously contradicts this.
What is the advantage of explicitly requiring one OS thread to execute (the foreign calls made by) several Haskell threads?
So far, I was only able to think of two possible situations:
a) The foreign functions don't care what thread they are called from
In that case, I would like the implementation to run my Haskell threads in the most efficient way possible. Currently, that means scheduling them all in one OS thread, but that is an implementation detail that I don't want to care about when I'm writing a normal application. On a four-processor-SMP machine, the most efficient way is to run them simultaneously in four OS threads (no implementation currently supports this, but there's experimental code in the GHC repository).
b) The foreign functions do care what thread they are called from
In that case I want the implementation to have an exact correspondence between Haskell threads and OS thread. I just want to think about "one thread", and I don't want to manage some correspondence between Haskell threads and OS threads manually.
Using the new primitive, we can view the new "threadsafe" keyword as syntactic sugar:
foreign import threadsafe foo :: Int -> IO Int
===>
foo :: Int -> IO Int foo i = threadSafe (primFoo i)
foreign import "foo" primFoo :: IO Int
where
threadSafe :: IO a -> IO a threadSafe io = do result <-newEmptyMVarforkOS (do{ x <-io; putMVar result x }) getMVar result
That looks dangerous:
I want to call both threadsafe imports and unsafe imports from a "bound thread", and I expect all foreign calls from a bound thread to be executed from the same OS thread (by the definitioon of a "bound thread"). This implementation of "threadsafe" always uses another (new or pooled) OS thread for the threadsafe call.
getOSThread :: ThreadID -> OSThreadID forkIOIn :: OSThreadID -> IO () -> IO ThreadID
Why should the RTS do inter-OS-thread messaging for us?
I have the feeling that it is not difficult to implement "forkOS" and family
once the runtime system has been upgraded to support multiple OS threads.
Wolfgang, you seem to be the expert on the OS thread area, would it be hard?
It would definitely more difficult to implement in GHC than the current proposal, but it could be done. In fact I think that implementing it would be more fun for me than having to use it afterwards.
I am not saying that we should discard the "threadsafe" keyword as it might
be a useful shorthand, but I think that it is in general a mistake to try to keep the management of OS threads implicit -- don't use new keywords, add combinators to implement them!
Management of OS threads _should_ be kept implicit. Ideally, the user should never notice that the GHC runtime is using green threads internally.
I feel that the following has happened; urk, we need some way of keeping haskell threads running while calling C; we add "threadsafe"; whoops, sometimes
a function expects that it is run in the same OS thread; we add "bound"; whoops, sometimes functions expect to be run from a specific OS thread... unsolved??
Not unsolved. Use Control.Concurrent.Chan :-)
Before we know it, we have added tons of new keywords to solve the wrong problem.
The problem being, that some Haskell implementation try to optimize concurrency by doing the scheduling themselves. We have to provide hints (threadsafe and bounds) to the implementation to specify just how much it is allowed to optimize. We should never be required to explicitly do the "optimization" in the source code. It will break with SMP implementations (which I expect to be using in a few years), because different optimizations are required - suddenly it will be desirable to have multiple OS threads for performance reasons.
Maybe it is time to take a step back and use a somewhat lower level model with
two fork variants: "forkIO" (in the same OS thread) and "forkOS" (in a new OS thread).
It seems that none of the above problems occur when having explicit control.
In general it seems that OS threads are a resource that is too subtle to be managed automatically as they have a profound impact on how libraries are used and applications are structured.
My recipe:
1) Mark all your foreign imports as threadsafe
2) Mark foreign imports that are guaranteed to only need a short amount of time (<50ms at most, I'd say) and that won't call back to Haskell, as unsafe
3) Just pretend that every Haskell thread is an OS thread
4) If you're using libraries that rely on thread-local state (and therefore can find out that point 3 might not be strictly true), add "bound" to your foreign exports or wrap your IO actions in forkBoundThread.
I don't see any remaining problems, and it looks simpler to me than managing Haskell threads and OS threads explicitly.
OK, enough talking about why I like my ideas better than yours ;-) , I still have a few questions:
What would a safe foreign import do in your proposal?
What other Haskell threads would be blocked when I call a safe foreign import? When would they be unblocked again? What happens when the foreign import calls back to Haskell? Or would they be blocked at all? For A Haskell implementation that runs in one thread and just uses separate threads for foreign calls, it may be natural to not block until somebody tries to make another foreign call in the same OS thread...
And how would the whole thing work in a SMP Haskell system?
When writing programs and libraries, how would I manage the complex interactions that could happen? How can a library make use of forkIO if it doesn't know what _other_ Haskell threads might be running in the same OS thread?
Currently, I can also use the threadsafe attribute for long calculations outside the IO monad, i.e.
foreign import ccall threadsafe doSomeTerriblyComplicatedCalculation :: Double -> Double
If one Haskell thread evaluated this, all other Haskell threads would keep running. Do we need to use unsafePerformIO again in your proposal?
Simon Marlow wrote:
I'm basing this on two assumptions: (a) switching OS threads is expensive and (b) threadsafe foreign calls are common. I could potentially be wrong on either of these, and I'm prepared to be persuaded. But if both (a) and (b) turn out to be true, then worse is worse in this case.
a) Is probably true. If it was absolutely wrong, we could do away with all the complexity, use one OS thread for each Haskell thread and shorten the GHC RTS by a few thousand lines...
b) Personally, I would want to use them for every foreign call that is not guaranteed to finish within, say, 50ms, so it will certainly be true for _my_ programs.
I would expect all libraries that I want to use to do this, too, because otherwise my program might unexpectedly be blocked by threads spawned by the library.
Cheers,
Wolfgang
_______________________________________________ FFI mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/ffi
