Re: Panther, Perl 5.8.*, threads, etc.

Dan Sugalski Thu, 17 Jul 2003 07:27:13 -0700

At 6:04 PM -0400 7/16/03, Shawn Corey wrote:

Hi,

As far as I know, fork re-uses the same program image; that's what the sticky bit is all about (see man 2 chmod).

No. The sticky bit marks the executable image as worthy of serious caching, but that's separate from fork. Don't forget that some of the pages in an executable image don't ever appear directly in the processes that invoke them, as there is potentially writable data needing copying in and zeroed pages to be allocated. Processes often map in readonly pages from cached images when those images are invoked, but that has nothing to do with fork. (Though it has rather a lot to do with exec)

It does re-create the data image for an new program.

No. No, it doesn't. Or at least, on systems with copy-on-write provided via hardware memory protection, it doesn't have to.

On systems without COW provided in some form, generally a full memory copy is done when the forked process is created. Memory pages tagged as readonly (possibly executable and readonly, depends on the system) will not get copied (except in really, really old fork-capable systems) and since this is where most of the executable code lives in most processes that doesn't get copied. Non-readonly pages, usually data, will get copied as part of the fork.

In a system with sufficient hardware support, when a new process is created via fork all that happens is the system marks all the writable PTEs (page table entries, the things the hardware memory management unit uses to handle virtual-to-real memory address remapping and memory protection) in the parent process as readonly and copy-on-writable, and then copies the PTEs into the child process. The memory pages themselves are *not* copied, just the PTEs. When either the child or the parent then writes to a protected page that page is then copied to a new segment of memory and the PTE for that page is updated and the readonly and COW markers are removed. This is generally (though *not* always) less expensive than the full copy method, as most of the writable pages in most process don't actually get touched after forking.

Where copy-on-fork vs COW breaks for performance is system dependent, as it depends on the support the MMU provides (since without refcounts in the PTEs both the parent and child process potentially have to copy a COW'd page they write to) and how expensive interrupt handling is, since cloning a COW page normally requires at least some minimal amount of OS intervention, to locate a free page to copy to if nothing else.

This leads to the confusion I've been having; how can you create a thread that's not perl? Perl (OK, advance perl implementations) allows threads but these threads must be within the same perl program. A thread that runs another program/script is a fork. A thread runs the same program image with the same data image. Forks (processes) run a different program image and a different (necessary) data image. I see no advantage in creating a thread that loads a different process. Calling fork() and eval() is more understandable than threading then eval(). Could someone clear up my confusion?

You seem to have some fundamental confusion over what's going on with threads vs forks. (At least in current user-level OSes--it's all different in the embedded and research space) I've no doubt what I wrote above won't help that. :)

A thread is just a point of execution (with all the associated bits) in a process. Multiple threads mean you have multiple execution points simultaneously in the same process. There's only one "program" loaded into a process, though it can certainly have chunks of code that are essentially independent and thus simulate multiple programs, but this isn't any different from your program having a sub that acts as if the rest of the program doesn't exist.

A fork, on the other hand, creates an entirely new, separate process--it does *not*, however, have to run a different program. (Neither is there anything stopping a thread from running a separate program, but since doing this blows away all the threads in a process and starts fresh(ish) it's generally not done, as it's awfully drastic) Unless something's actively done it *won't* run a different program.

On Wednesday, July 16, 2003, at 04:37 PM, Dan Sugalski wrote:
At 1:15 PM -0700 7/16/03, Rich Morin wrote:
At 8:33 PM +0100 7/16/03, David Cantrell wrote:
As far as the program is concerned, it's a complete copy.  But yes,
most modern virtual memory implementations will, I believe, do copy
on write.  I haven't actually tested this on OS X though :-)
OK, I'm curious; how _would_ one go about testing this?
The easiest way to do so is to snag the Darwin source and take a look at some of the low-level MMU manipulation code in the kernel. It should be pretty obvious whether (though not necessarily how :) it's done.


--
                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Panther, Perl 5.8.*, threads, etc.

Reply via email to