On Mon, 18 Apr 2011 12:56:39 +0200
Ertugrul Soeylemez <e...@ertes.de> wrote:
Mike Meyer <m...@mired.org> wrote:
On Mon, 18 Apr 2011 11:07:58 +0200
Johan Tibell <johan.tib...@gmail.com> wrote:
On Mon, Apr 18, 2011 at 9:13 AM, Mike Meyer <m...@mired.org> wrote:
I always looked at it the other way 'round: threading is a hack to
deal with system inadequacies like poor shared memory performance
or an inability to get events from critical file types.
Real processes and event-driven programming provide a more robust,
understandable and scalable solutions.
<end rant>
We need to keep two things separate: threads as a way to achieve
concurrency and as a way to achieve parallelism [1].
Absolutely. Especially because you shouldn't have to deal with
concurrency if all you want is parallelism. Your reference [1]
covers
why this is the case quite nicely (and is essentially the
argument for
"understandable" in my claim above).
You also don't need Emacs/Vim, if all you want is to write a simple
plain text file. There is nothing wrong with concurrency, because
you
are confusing the high level model with the low level implementation.
Concurrency is nothing but a design pattern, and GHC shows that a
high
level design pattern can be mapped to efficient low level code.
Possibly true. The question is - can it be mapped to a design that's
as robust and scalable as the ones I'm used to working on?
In Haskell you should not use explicit, manual OS threading/
forking for
the same reason you shouldn't write machine code manually.
That's a good thing - providing it doesn't compromise robustness and
scalability.
It's useful to use non-determinism (i.e. concurrency) to model a
server processing multiple requests. Since requests are independent
and shouldn't impact each other we'd like to model them as
such. This implies some level of concurrency (whether using threads
and processes).
But because the requests are independent, you don't need concurrency
in this case - parallelism is sufficient.
Perhaps Haskell is the wrong language for you. How about
programming in
C/C++? I think you want more control over low level resources than
Haskell gives you. But I suggest having a closer look at
concurrency.
Personally, I don't want to have to worry about low-level resources,
or even concurrency. Having to do so feels to much like having to
explicitly allocate and free memory, or worry about register
allocations. But if I have to do those things to get robustness and
scalability until the languages start being able to deal with it, then
I need the RTS to get out of the way and let me do my job.
If I'm using a value that needs protection from concurrent access
without providing that protection, I want the system give me an
error. At run-time is acceptable, but compile time is better. I want
the system to make sure the concurrent protection mechanisms work
properly - no deadlocks, no stuck process, etc - without my having to
do anything but indicate which values need such protection.
The unix process model works quite well. Compared to a threaded
model,
this is more robust (if a process breaks, you can kill and
restart it
without affecting other processes, whereas if a thread breaks,
restarting the process and all the threads in it is the only safe
option) and scalable (you're already doing ipc, so moving processes
onto more systems is easy, and trivial if you design for it). The
events handled by a single process are simple enough that your
callback/event spaghetti can line up in nice, straight strands.
When writing concurrent code you don't care about how the RTS maps
it to
processes and threads. GHC chose threads, probably because they are
faster to create/kill and consume less memory. But this is an
implementation detail the Haskell developer should not have to worry
about.
So - what happens when a thread fails for some reason? I'm used to
dealing with systems that run 7x24 for weeks or even months on
end. Hardware hiccups, network failures, bogus input, hung clients,
etc. are all just facts of life. I need the system to keep running
properly in the face of all those, and I need them to disrupt the
world as little as possible.
Given that the RTS has taken control over this stuff, I sort of expect
it to take care of noticing a dead process and restarting it as
well. All of which is fine by me.
We don't need to do this. We can keep a concurrent programming
model
and get the execution efficiency of an event driven model. This is
what GHC's I/O manager achieves. On top of that we also get
parallelism for free. Another way to look at it is that GHC
provides
the scheduler (using a thread for the event loop and a separate
worker pool) that you end up writing manually in event driven
frameworks.
So my question is - can I still get the robustness/scalability
features I get from the unix process model using haskell? In
particular, it seems like ghc starts threads I don't ask it to, and
using both threads & forks for parallelism causes even more
headaches
than concurrency (at least on unix & unix-like systems), so just
replicating the process model won't work well. Do any of the haskell
parallel processing tools work across multiple systems?
Effectively no (unless you want to use the terribly outdated GPH
project), but that's a shortcoming of the current RTS, not of the
design
patterns you use in Haskell. By design Haskell programs are well
suited
for an auto-distributing RTS. It's just that no such RTS exists for
recent versions of the common compilers.
So is anyone working on such a package for haskell? I know clojure's
got some people working on making STM work in a distributed
environment, but that's outside the goals of the core team.
In other words: Robustness and scalability should not be your
business
in Haskell. You should concentrate on understanding and using the
concurrency concept well. And just to encourage you: I write
productive concurrent servers in Haskell, which scale very well and
probably better than an equivalent C implementation would.
Reason: A
Haskell thread is not mapped to an operating system thread (unless
you
used forkOS). When it is advantageous, the RTS can well decide to
let
another OS thread continue a running Haskell thread. That way the
active OS threads are always utilized as efficiently as possible. It
would be a pain to get something like that with explicit threading
and
even more, when using processes.
Well, *someone* has to worry about robustness and scalability. Users
notice when their two minute system builds start taking four minutes
(and will be at my door wanting me to fix it) because something didn't
scale fast enough, or have to be run more than once because a failing
component build wasn't restarted properly. I'm willing to believe that
haskell lets you write more scalable code than C, but C's tools for
handling concurrency suck, so that should be true in any language
where someone actually thought about dealing with concurrency beyond
locks and protected methods. The problem is, the only language I've
found where that's true that *also* has reasonable tools to deal with
scaling beyond a single system is Eiffel (which apparently abstracts
things even further than haskell - details like how concurrency is
achieved or how many concurrent operations you can have are configured
when you start an application, *not* when writing it). Unfortunately,
Eiffel has other problems that make it undesirable.
That's why the RTS lets you choose the number of OS threads only
instead
of giving you low level control over the threads. It spawns as many
threads as you ask it to spawn and manages them with its own
strategy.
The only way to manipulate this strategy is by deciding whether a
particular Haskell thread is bound (forkOS) or not (forkIO).
Does the programmer have to worry about such trivia as the number of
threads to use?
<mike
--
Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html
Independent Software developer/SCM consultant, email for more
information.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
_______________________________________________
Haskell mailing list
Haskell@haskell.org
http://www.haskell.org/mailman/listinfo/haskell