On Fri, Mar 15, 2002, Malcolm Kavalsky wrote about "Re: pthreads question": > Linux has a very efficient process model and if you are writing any > reasonable > size program and don't want to get into trouble, then I suggest you > split it into > multiple processes and use IPC.
Threads, especially on Linux (the common "LinuxThreads" implementation) are the same thing as processes, with two things are shared between all of them: 1. All the memory is shared (there is a seperate stack for each thread, but threads can still reach another thread's stack via pointers). 2. All file descriptors are shared. (see clone(2) for more information). So if you find yourself, in writing (or designing) a multi-process program doing so much shared memory that you start wishing malloc() would just give you shared memory, if you find yourself sending file descriptors to other processes (over unix domain sockets, that's the way to do that...), if you find yourself using a lot of semaphores (arrg, those System V semaphores are annoying...) to do mutual-exclusion or wakeups on these shared memory areas - well... threads might be a more appropriate framework for your program. Moreover, threads can yield real performance benefits on SMP machines. Again, only if you know what you're doing (threads aren't one of those "let's just write a few random lines of code and see if it works" paradigms). > In my 20 years of programming experience, there are very few cases which I > have come across that warrant the use of threads. I agree. But I *did* use them, and found that when you know what you're doing, you can actually get beautiful results. > The initial appeal of using threads (easy sharing of global data structures, > concurrent programming, low overhead task switching) is quickly dispelled > the minute you start wondering why your program is crashing/dead-locking. It is indeed hard to debug a multithreaded program, more than to debug a single-process program, but not necessarily harder to debug than a multi- process program where the processes actually communicate a lot and have a lot of shared memory and sempahores set up. Comparing a multithreaded program to an "embaracingly parallel" program (a multi-process program whose processes just work alone and never communicate) is irrelevant. You wouldn't use threads to implement embaracingly parallel programs; This is also why processes (rather than threads) can make a lot of sense when implementing web servers and the like (of course both processes and threads have a severe limitation when implementing a web server, but that's an issue for another post). > After adding in lots of mutexes to protect all your data structures, > your program > slows to a crawl, and tracing it reveals that 99% of the time is spent > in lock/unlock > calls. This would never happen if you actually understand what you're doing: not just the syntax of the API but also the reasons why things should be done the way they are. Students might want to check out courses like "Distributed and Parallel Programming" in the Technion, or you can read a good book to try to get a feel for the theory. I have written a relatively-large threaded program (there were about 8 threads doing different things), and my experience is that I always understood why and where mutexes and condition variables were needed, and I never had to go back and add ones in places I forgot. The locking overhead was minimal because I designed the program that way (you lock mutexes only in really critical sections and try to work on local variables most of the time). You're right though, that a bad programmer might make a real mess of things by using threads (where all memory is shared) and will have a really hard time debugging... > Note also that C++ has certain effects that make use of threads > dangerous What kind of effects? The threaded program I mentioned was in C++, and not only I didn't see any ill-effects, I actually was very happy I chose C++ to program it: One of the dangers of threads is that you have a "too easy" access to variables your thread was not supposed to access (or access without holding a lock), and C++ makes it very easy to force you to, say, access some variable only through a method which also grabs a lock. > and > any library calls that you use automatically, need to be checked that > they are MT-safe. Luckily this is not a problem any more for glibc except in a small number of functions (say inet_ntoa) whose manual says the return value is a statically allocated buffer (so called "non-reentrant" routines). Until a few years ago, this was a serious problem in most Unix versions and Linux, which is why threaded programming was almost unheard of in the Unix world. > and the OS protects each task from the other. You need to work a little > harder in the > beginning to setup the IPC, but once that is done, you are home free, Again, if you use a lot of shared memory and so on, this begins to become annoying. Using shared memory you need to allocate fixed-size shared memory areas and "allocate" place in them or use some sort of special allocation library like "mm". It's doable, but isn't as easy as just doing malloc() as in threads. Also, what if a library function returns an malloc()ed area, and you need it to be allocated in shared memory instead? See - badly designed libraries can hurt you not only if you are using threads. > Most windows programmers that I have met, are used to working with > threads, and it > is hard to change their habits to use processes. Supposedly Windows' process implementation sucks (or sucked?) bigtime, being very inefficient, which is why Windows programmers became used to programming only with threads. Compare this to Unix's (or Linux's) thread implementation sucking bigtime until a few years ago which is why Unix programmers became used to programming only with processes. Both "fanaticisms" are silly. You should know about both methods and use the one that best fits your needs. I don't know if you'll end up with a 10%-90% multiprocess/multithread ratio, 50%-50%, or 90%-10% - what I do know is that most programs should be neither threaded nor multi-process at all... > did then the context switches were too high. Linux, on the other hand, > has an excellent > low-overhead process model which removes 99% of the reason to use threads. Right, but I don't agree about the 99% figure. As I said, not everybody wants to use threads just for lower overhead in context switching. -- Nadav Har'El | Friday, Mar 15 2002, 2 Nisan 5762 [EMAIL PROTECTED] |----------------------------------------- Phone: +972-53-245868, ICQ 13349191 |The knowledge that you are an idiot, is http://nadav.harel.org.il |what distinguishes you from one. ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]