book

Whiteknight Wed, 03 Dec 2008 13:01:33 -0800

Author: Whiteknight
Date: Wed Dec  3 13:01:17 2008
New Revision: 33462

Modified:
   trunk/docs/book/ch08_architecture.pod


Log:
[Book] More updates to chapter 8, fixing some wording and making things clearer.

Modified: trunk/docs/book/ch08_architecture.pod
==============================================================================
--- trunk/docs/book/ch08_architecture.pod       (original)
+++ trunk/docs/book/ch08_architecture.pod       Wed Dec  3 13:01:17 2008
@@ -573,9 +573,9 @@
 Z<CHP-7-SECT-4.1>
 
 Parrot's base X<I/O;Parrot> I/O system is fully X<asynchronous I/O>
-asynchronous I/O with callbacks and per-request private data. Since this
+asynchronous with callbacks and per-request private data. Since this
 is massive overkill in many cases, we have a plain vanilla synchronous
-I/O layer that your programs can use if they don't need the extra power.
+I/O layer that programs can use if they don't need the extra power.
 
 Asynchronous I/O is conceptually pretty simple. Your program makes an
 I/O request. The system takes that request and returns control to your
@@ -583,7 +583,8 @@
 the I/O request. Once satisfied, the system notifies your program in
 some way. Since there can be multiple requests outstanding, and you can't
 be sure exactly what your program will be doing when a request is
-satisfied, programs that make use of asynchronous I/O can be complex.
+satisfied, programs that make use of asynchronous I/O can become very
+complex.
 
 X<synchronous I/O>
 Synchronous I/O is even simpler. Your program makes a request to the
@@ -597,8 +598,7 @@
 have a much higher throughput than a synchronous system. They move
 data around much faster--in some cases three or four times faster.
 This is because the system can be busy moving data to or from disk
-while your program is busy processing data that it got from a previous
-request.
+while your program is busy processing the next set of data.
 
 For disk devices, having multiple outstanding requests--especially on
 a busy system--allows the system to order read and write requests to
@@ -654,35 +654,41 @@
 
 Parrot's events are fairly simple. An event has an event type, some
 event data, an event handler, and a priority. Each thread has an event
-queue, and when an event happens it's put into the right thread's
-queue (or the default thread queue in those cases where we can't tell
-which thread an event was destined for) to wait for something to
-process it.
-
-Any operation that would potentially block drains the event queue
-while it waits, as do a number of the cleanup opcodes that Parrot uses
-to tidy up on scope exit. Parrot doesn't check each opcode for an
-outstanding event for pure performance reasons, as that check gets
-expensive quickly. Still, Parrot generally ensures timely event
-handling, and events shouldn't sit in a queue for more than a few
-milliseconds unless event handling has been explicitly disabled.
+queue, and when an event occurs it is put into the queue for the correct
+thread. Once in the queue, events must wait until an event handler gets
+a chance to process it. If there is no clear destination thread for the
+event, it is put into a default queue where it can be processed.
+
+Any operation that would potentially block normal operation, such as a
+C<sleep> command or the cleanup operations that Parrot calls when it exits
+a subroutine, causes the event handlers to process through the events
+in the queue. In this way, when your program thinks it is just waiting,
+it is actually getting a lot of work done in the background. Parrot
+doesn't check an outstanding event to handle during every opcode. This
+is a pure performance consideration: All those checks would get expensive
+very quickly. Parrot generally ensures timely event handling, and events
+shouldn't ever be ignored for more then a few milliseconds. N<Unless
+asynchronous event handling is explicitly disabled, and then events will
+stay ignored for as long as the programmer wants.>.
 
 When Parrot does extract an event from the event queue, it calls that
 event's event handler, if it has one. If an event doesn't have a
 handler, Parrot instead looks for a generic handler for the event type
 and calls it instead. If for some reason there's no handler for the
-event type, Parrot falls back to the generic event handler, which
-throws an exception when it gets an event it doesn't know how to
-handle.  You can override the generic event handler if you want Parrot
-to do something else with unhandled events, perhaps silently
-discarding them instead.
+event type Parrot falls back to the generic event handler which
+throws an exception as a last resort. You can override the generic event
+handler if you want Parrot to do something else with unhandled events,
+perhaps silently discard them instead.
 
 Because events are handled in mainline code, they don't have the
 restrictions commonly associated with interrupt-level code. It's safe
 and acceptable for an event handler to throw an exception, allocate
 memory, or manipulate thread or global state safely. Event handlers
-can even acquire locks if they need to, though it's not a good idea to
-have an event handler blocking on lock acquisition.
+can even acquire locks if they need to. Even though event handlers have
+all these capabilities, it doesn't mean they should be used with
+impugnity. An event handler blocking on a lock can easily deadlock a
+program that hasn't been properly designed. Parrot gives you plenty of
+rope, it's up to the programmer not to trip on it.
 
 Parrot uses the priority on events for two purposes. First, the
 priority is used to order the events in the event queue. Events for a
@@ -700,36 +706,44 @@
 default priority of an event, or adjusting the current minimum
 priority level, is a rare occurrence.  It's almost always a mistake to
 change them, but the capability is there for those rare occasions
-where it's the correct thing to
-do.
+where it's the correct thing to do.
 
 =head2 Signals
 
 Z<CHP-7-SECT-4.3>
 
 X<signals, Parrot>
-Signals are a special form of event, based on the Unix signal mechanism.
-Parrot presents them as mildly special, as a remnant of Perl's Unix
-heritage, but under the hood they're not treated any differently from
-any other event.
+Signals are a special form of event, based on the standard Unix signal
+mechanism. Even though signals are occasionally described as being
+special in some way, under the hood they're treated like any other event.
+The primary difference between a signal and an event is that signals
+have names that Unix and Linux programmers will recognize. This can be
+a little confusing, especially when the Parrot signal doesn't use exactly
+the same semantics as the Unix signal does.
 
 The Unix signaling mechanism is something of a mash, having been
-extended and worked on over the years by a small legion of undergrad
-programmers. At this point, signals can be divided into two
-categories, those that are fatal, and those that aren't.
+extended and worked on over the years by a small legion of ambitious but
+underpaid programmers. There are generally two types of signals to deal
+with: those that are fatal, and those that are not.
 
 X<fatal signals> 
 Fatal signals are things like X<SIGKILL>
-SIGKILL, which unconditionally kills a process, or SIGSEGV, which
-indicates that the process has tried to access memory that isn't part
-of your process.  There's no good way for Parrot to catch these
-signals, so they remain fatal and will kill your process.  On some
-systems it's possible to catch some of the fatal signals, but
-Parrot code itself operates at too high a level for a user program to
-do anything with them--they must be handled with special-purpose code
-written in C or some other low-level language.  Parrot itself may
-catch them in special circumstances for its own use, but that's an
-implementation detail that isn't exposed to a user program.
+SIGKILL, which unconditionally kills a process, or X<SIGSEGV> SIGSEGV,
+which indicates that the process has tried to access memory that isn't
+part of your process. Most programmers will better know SIGSEGV as a
+"segmentation fault", something that should be avoided at all costs.
+There's no good way for Parrot to catch and handle these signals, since
+the occur at a lower level in the operating system and are typically
+presented to Parrot long after anything can be done about it. These
+signals will therefore always kill Parrot and whatever programs were
+running on t. On some systems it's possible to catch some of
+N<sometimes> the fatal signals, but Parrot code itself operates at too
+high a level for a user program to do anything with them. Any handlers
+for these kinds of signals would have to be written at the lowest levels
+in C or a similar language, something that cannot be accessed directly
+from PIR, PASM, or any of the high-level languages that run on Parrot.
+Parrot itself may try to catch these signals in special circumstances for
+its own use, but that functionality isn't exposed to a user program.
 
 X<non-fatal signals>
 Non-fatal signals are things like X<SIGCHLD> SIGCHLD, indicating that a
@@ -763,55 +777,73 @@
 Threads are a means of splitting a process into multiple pieces that
 execute simultaneously.  It's a relatively easy way to get some
 parallelism without too much work. Threads don't solve all the
-parallelism problems your program may have. Sometimes multiple
-processes on a single system, multiple processes on a cluster, or
-processes on multiple separate systems are better. But threads do
-present a good solution for many common cases.
+parallelism problems your program may have N<And in fact, threading
+can cause it's own parallism problems, if you aren't careful>.
+Sometimes multiple processes on a single system, multiple processes
+on a cluster, or processes on multiple separate systems are better
+for parallelized tasks then using threads.
 
 All the resources in a threaded process are shared between threads.
-This is simultaneously the great strength and great weakness of
-threads. Easy sharing is fast sharing, making it far faster to
+This is simultaneously the great strength and great weakness of the
+method. Easy sharing is fast sharing, making it far faster to
 exchange data between threads or access shared global data than to
 share data between processes on a single system or on multiple
-systems. Easy sharing is dangerous, though, since without some sort of
-coordination between threads it's easy to corrupt that shared data.
+systems. Easy sharing of data can be dangerous, though, since data can
+be corrupted if the threads don't coordinate between themselves somehow.
 And, because all the threads are contained within a single process, if
-any one of them fails for some reason the entire process, with all its
-threads, dies.
+any one of them causes a fatal error, Parrot and all the programs and
+threads running on top of it dies.
 
 With a low-level language such as C, these issues are manageable. The
 core data types, integers, floats, and pointers are all small enough
-to be handled atomically. Composite data can be protected with
-mutexes, special structures that a thread can get exclusive access to.
-The composite data elements that need protecting can each have a mutex
-associated with them, and when a thread needs to touch the data it
-just acquires the mutex first. By default there's very little data
-that must be shared between threads, so it's relatively easy, barring
-program errors, to write thread-safe code if a little thought is given
-to the program structure.
+to be handled atomically. You never have to worry that two threads will
+try to write a value to the same integer variable, and the result will
+be a corrupted combination of the two. It will be one or the other value,
+depending on which thread wrote to the memory location last. Composite
+data structures, on the other hand, are not handled atomically. Two
+threads both accessing a large data structure can write incompatible data
+into different fields. To avoid this, these stuctures can be protected
+with special devices called mutexes. Mutexes N<depending on the exact
+implementation and semantics, Mutexes can also be known as locks,
+spinlocks, semaphores, or critical sections.> are special structures that
+a thread can get exclusive access to. Like a baton in a relay race, only
+one thread can own a mutex at a time, and by convention only the thread
+with the mutex can access the associated data. The composite data
+elements that need protecting can each have their own mutex, and when a
+thread tries to touch the data it must acquires the mutex first. If
+another thread already has the mutex, all other threads must wait before
+they can get the mutex and access the data. By default there's very
+little data that must be shared between threads, so it's relatively easy
+to write thread-safe code if a little thought is given to the program
+structure. Thread safety is far too big a topic to cover in this book,
+but trust us when we say it's something worth being concerned with.
 
 X<Parrot;native data type;;(see PMCs)>
 X<PMCs (Parrot Magic Cookies);Parrot's native data type> 
-Things aren't this easy for Parrot, unfortunately. A PMC, Parrot's
-native data type, is a complex structure, so we can't count on the
-hardware to provide us atomic access. That means Parrot has to provide
-atomicity itself, which is expensive. Getting and releasing a mutex
-isn't really that expensive in itself. It has been heavily optimized by
-platform vendors because they want threaded code to run quickly. It's
-not free, though, and when you consider that running flat-out Parrot
-does one PMC operation per 100 CPU cycles, even adding an additional 10
-cycles per operation can slow Parrot down by 10%.
-
-For any threading scheme, it's important that your program isn't
-hindered by the platform and libraries it uses. This is a common
-problem with writing threaded code in C, for example. Many libraries
-you might use aren't thread-safe, and if you aren't careful with them
-your program will crash. While we can't make low-level libraries any
-safer, we can make sure that Parrot itself won't be a danger. There is
-very little data shared between Parrot interpreters and threads, and
-access to all the shared data is done with coordinating mutexes. This
-is invisible to your program, and just makes sure that Parrot itself
-is thread-safe.
+PMCs are complex structures, even the simplest ones. We can't count on
+the hardware or even the operating system to provide us atomic access.
+Parrot has to provide that atomicity itself, which is expensive. Getting
+and releasing a mutex is an inexpensive operations by itself, and has
+been heavily optimized by platform vendors because they want threaded
+code to run quickly. It's not free, though, and when you consider that
+Parrot must access hundreds or even thousands of PMCs for some programs,
+any operations that get performed for all accesses can impose a huge
+performance penalty.
+
+=head3 External Libraries
+
+Even if your program is thread-safe, and Parrot itself is thread-safe,
+that doesn't mean there is no danger. Many libraries that Parrot uses
+or that your program taps into through NCI may not be thread safe, and
+may crash your program if you attempt to use them in a threaded
+environment. Parrot cannot make existing unsafe libraries any safer
+N<We can send nagging bug reports to the library developers.>, but at
+least Parrot itself won't introduce new problems. Whenever you're using
+an external library, you should double-check that it's safe to use with
+threading environments. If you aren't using threading in your programs,
+you don't need to worry about it.
+
+=head3 Threading Models
 
 When you think about it, there are really three different threading
 models. In the first one, multiple threads have no interaction among
@@ -830,25 +862,25 @@
 around structured data.
 
 In the third threading model, multiple threads run and share data
-between themselves. While Parrot can't guarantee that data at the user
-level remains consistent, it can make sure that access to shared data
-is at least safe. We do this with two mechanisms.
+between themselves directly. While Parrot can't guarantee that data at
+the user level remains consistent, it can make sure that access to shared
+data is at least safe. We do this with two mechanisms.
 
 First, Parrot presents an advisory lock system to user code. Any piece
 of user code running in a thread can lock a variable. Any attempt to
 lock a variable that another thread has locked will block until the
 lock is released. Locking a variable only blocks other lock attempts.
-It does I<not> block plain access. This may seem odd, but it's the
-same scheme used by threading systems that obey the POSIX thread
-standard, and has been well tested in practice.
+It does I<not> block access. This may seem odd, but it's the same scheme
+used by threading systems that obey the POSIX thread standard, and has
+been well tested in practice.
 
 Secondly, Parrot forces all shared PMCs to be marked as such, and all
 access to shared PMCs must first acquire that PMC's private lock. This
-is done by installing an alternate vtable for shared PMCs, one that
+is done by installing an alternate VTABLE for shared PMCs, one that
 acquires locks on all its parameters. These locks are held only for
-the duration of the vtable function, but ensure that the PMCs affected
-by the operation aren't altered by another thread while the vtable
-function is in progress.
+the duration of the VTABLE interface call, but ensure that the PMCs
+affected by the operation aren't altered by another thread while the
+VTABLE operation is in progress.
 
 =head1 Objects

[svn:parrot] r33462 - trunk/docs/book

Reply via email to