Re: pthreads question

Malcolm Kavalsky Fri, 15 Mar 2002 22:35:52 -0800

There are a few points that I would like to make about the last postings 
to my
"don't use threads" message.

Nadav Har'El wrote:

>Threads, especially on Linux (the common "LinuxThreads" implementation)
>are the same thing as processes, with two things are shared between all
>of them:
>
> 1. All the memory is shared (there is a seperate stack for each thread,
>    but threads can still reach another thread's stack via pointers).
>
> 2. All file descriptors are shared.
>
>(see clone(2) for more information).
>
>So if you find yourself, in writing (or designing) a multi-process program
>doing so much shared memory that you start wishing malloc() would just give
>you shared memory, if you find yourself sending file descriptors to other
>processes (over unix domain sockets, that's the way to do that...), if you
>find yourself using a lot of semaphores (arrg, those System V semaphores are
>annoying...) to do mutual-exclusion or wakeups on these shared memory areas -
>well... threads might be a more appropriate framework for your program.
>
Time for another bombshell, ready, set go :

    "Try to avoid using shared memory and system V semaphores!"

Before you attack me, try the following test:

1. Malloc a 100 Mb buffer, fill it with random data
2. Send the buffer over a Unix socket to another process
3. Time how long it takes to send the data ...

The result will be, that it takes almost nothing. Certainly faster than 
memory speed.
This is because the data is not actually copied, but the OS does clever 
pointer manipulation,
copy on write, etc to speed stuff like this up.

What do we gain by passing large data structures between processes 
instead of using shared
memory ? Simple synchronization and protection of data.

Another alternative, which I like is to write a separate server process 
which manages the data structure and all
processes that want to access this data are clients of this process. 
Once again, the synchronization is
simple, and you will avoid deadlock, and starvation.

Of course there are many other patterns/models (also in ACE) which 
accomodate similar situations, that
preclude you inventing your own, such as single writer, multiple readers 
etc.

>
>Moreover, threads can yield real performance benefits on SMP machines.
>Again, only if you know what you're doing (threads aren't one of those "let's
>just write a few random lines of code and see if it works" paradigms).
>
I think that it is far easier for the OS to take advantage of SMP 
machines using multiple
processes than multiple threads. Problems with cache-coherency are 
trickier with threads
than with processes that have much less common data.

>It is indeed hard to debug a multithreaded program, more than to debug a
>single-process program, but not necessarily harder to debug than a multi-
>process program where the processes actually communicate a lot and have a lot
>of shared memory and sempahores set up.
>
>Comparing a multithreaded program to an "embaracingly parallel" program (a
>multi-process program whose processes just work alone and never communicate)
>is irrelevant. You wouldn't use threads to implement embaracingly parallel
>programs; This is also why processes (rather than threads) can make a lot
>of sense when implementing web servers and the like (of course both processes
>and threads have a severe limitation when implementing a web server, but
>that's an issue for another post).
>

The main difference between a multi-threaded process and multiple 
single-threaded processes
which communicate (say over sockets), is that you can debug each process 
in its own gdb window
single-stepping, breakpointing with deterministic results. When you set 
a breakpoint in a multi-threaded
process, you can never know what has happened during the time up until 
the breakpoint was reached,
since the scheduling of the threads is random. A solution to this 
problem would be to synchronize your
threads in some manner, however this probably is not always the right 
thing to do.

>>Note also that C++  has certain effects that make use of threads 
>>dangerous
>>
>What kind of effects? The threaded program I mentioned was in C++, and
>not only I didn't see any ill-effects, I actually was very happy I chose
>C++ to program it: One of the dangers of threads is that you have a "too
>easy" access to variables your thread was not supposed to access (or
>access without holding a lock), and C++ makes it very easy to force you to,
>say, access some variable only through a method which also grabs a lock.
>
Once programmers get intoxicated with threads, I have noticed that at 
every opportunity,
instead of just calling a function, they spawn a thread to execute it, 
hoping that it will run
in parallel (to save time). In the context of C++ or any OOP language, 
this means that
you will find that methods of an object are being run as threads. Of 
course all the data in
the object is shared, and now needs to be protected, again slowing down 
the entire object.

Obviously, this is not a problem with C++, Java or any other OOP 
language with threads,
 and bad code can be written in any language, but young programmers need 
to be aware of
the pitfalls. Even more important, if you are a software team leader, 
expect to run into
these situations, when helping debug code.

>>and
>>any library calls that you use automatically, need to be checked that 
>>they are MT-safe.
>>
>
>Luckily this is not a problem any more for glibc except in a small number
>of functions (say inet_ntoa) whose manual says the return value is a
>statically allocated buffer (so called "non-reentrant" routines).
>
>Until a few years ago, this was a serious problem in most Unix versions
>and Linux, which is why threaded programming was almost unheard of in
>the Unix world.
>
>>and the OS protects each task from the other. You need to work a little 
>>harder in the
>>beginning to setup the IPC, but once that is done, you are home free, 
>>
>
>Again, if you use a lot of shared memory and so on, this begins to become
>annoying. Using shared memory you need to allocate fixed-size shared memory
>areas and "allocate" place in them or use some sort of special allocation
>library like "mm". It's doable, but isn't as easy as just doing malloc()
>as in threads.
>
>Also, what if a library function returns an malloc()ed area, and you need
>it to be allocated in shared memory instead? See - badly designed libraries
>can hurt you not only if you are using threads.
>

More reasons why I don't like simplistic usage of shared memory, without 
serious wrapping.

>
>
>>Most windows programmers that I have met, are used to working with 
>>threads, and it
>>is hard to change their habits to use processes.
>>
>
>Supposedly Windows' process implementation sucks (or sucked?) bigtime,
>being very inefficient, which is why Windows programmers became used to
>programming only with threads. Compare this to Unix's (or Linux's) thread
>implementation sucking bigtime until a few years ago which is why Unix
>programmers became used to programming only with processes.
>
>Both "fanaticisms" are silly. You should know about both methods and use
>the one that best fits your needs. I don't know if you'll end up with
>a 10%-90% multiprocess/multithread ratio, 50%-50%, or 90%-10% - what I do
>know is that most programs should be neither threaded nor multi-process at
>all...
>
I apologize if I sound fanatical, that is definitely not the case. I 
tend to exaggerate in order
to make my point more forcefully. Having programmed for 5 years using 
threads, since the
realtime OS that I was using didn't support processes and after moving 
to Linux
(using it as an embedded OS), I really enjoy the benefits and 
peace-of-mind that
I get using multi-processes instead.

You can program the same application in an infinite amount of ways, OOP, 
procedural,
multithread, multiprocess, shared memory, sockets etc. We have a very 
rich set of tools
from which to choose. After X years of programming you get comfortable 
with certain
methodologies, and are not always open to other ways. As a well rounded 
programmer
when designing a software architecture, you do want to try different 
approaches before
settling on the final design. You also need to take into account and 
protect yourself from
pitfalls by using well-tried and "foolproof" mechanisms, especially when 
you have a team
of programmers.

Malcolm

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: pthreads question

Reply via email to