Author: Whiteknight Date: Mon Dec 8 16:20:58 2008 New Revision: 33678 Modified: trunk/docs/book/ch12_opcodes.pod
Log: [Book] small updates to chapter 11, and adding some content to chapter 12 Modified: trunk/docs/book/ch12_opcodes.pod ============================================================================== --- trunk/docs/book/ch12_opcodes.pod (original) +++ trunk/docs/book/ch12_opcodes.pod Mon Dec 8 16:20:58 2008 @@ -4,107 +4,224 @@ Z<CHP-11> -The smallest executable component is not the compilation unit or even the subroutine, -but is in fact the opcode. Opcodes in PASM, like opcodes in other assembly languages, -are individual instructions that implement low-level operations in Parrot N<In the -world of microprocessors, the word "opcode" typically refers to the numeric identifier -for each instructions. The human-readable word used in the associated assembly language -is called the "mnemonic". An assembler, among other tasks, is responsible for converting -mnemonics into opcodes for execution. In Parrot, instead of referring to an instruction -by different names depending on what form it's in, we just call them all "opcodes">. Of -course the list of things that qualify as "low-level" in Parrot can be pretty advanced -compared to the functionality supplied by regular assembly language opcodes. +The smallest executable component is not the compilation unit or even +the subroutine, but is in fact the opcode. Opcodes in PASM, like opcodes +in other assembly languages, are individual instructions that implement +low-level operations in Parrot N<In the world of microprocessors, the +word "opcode" typically refers to the numeric identifier for each +instructions. The human-readable word used in the associated assembly +language is called the "mnemonic". An assembler, among other tasks, is +responsible for converting mnemonics into opcodes for execution. In +Parrot, instead of referring to an instruction by different names +depending on what form it's in, we just call them all "opcodes">. Of +course the list of things that qualify as "low-level" in Parrot can be +pretty advanced compared to the functionality supplied by regular +assembly language opcodes. -Before we talk about opcodes, we have to a little bit of talking about the various -runcores that invoke them. +Before we talk about opcodes, we have to a little bit of talking about +the various runcores that invoke them. =head2 Runcores -During execution, the runcore is like the heart of Parrot. The runcore controls calling -the various opcodes with the correct data, and making sure that program flow moves -properly. Some runcores, such as the I<precomputed C goto runcore> are optimized for -speed and don't perform many tasks beyond finding and dispatching opcodes. Other runcores, -such as the I<GC-Debug>, I<debug> and I<profiling> runcores help with typical software -maintenance and analysis tasks. Different runcores, because of the way they are structured, -require the opcodes to be compiled into different forms. Because of this, understanding -opcodes first requires an understanding of the Parrot runcores. +During execution, the runcore is like the heart of Parrot. The runcore +controls calling the various opcodes with the correct data, and making +sure that program flow moves properly. Some runcores, such as the +I<precomputed C goto runcore> are optimized for speed and don't perform +many tasks beyond finding and dispatching opcodes. Other runcores, +such as the I<GC-Debug>, I<debug> and I<profiling> runcores help with +typical software maintenance and analysis tasks. We'll talk about all +of these throughout the chapter. + +Different runcores, because of the way they are structured, require the +opcodes to be compiled into different forms. Because of this, +understanding opcodes first requires an understanding of the Parrot +runcores. =head3 Types of Runcores -Parrot has multiple runcores. Some are useful for particular maintenance tasks, some are -only available as optimizations in certain compilers, some are intended for general use, -and some are just interesing flights of fancy with no practical benefits. One runcore that -we've already seen is the debugging runcore which prompts the user for commands between -executing each opcode. Another valuable maintenance runcore is the GC dubug core (which runs a -full sweep of the garbage collector between each opcode). +Parrot has multiple runcores. Some are useful for particular maintenance +tasks, some are only available as optimizations in certain compilers, +some are intended for general use, and some are just interesing flights +of fancy with no practical benefits. Here we list the various runcores, +their uses, and their benefits. =over 4 =item* Slow Core -The slow core is a basic runcore design that treats each opcode as a separate function -at the C level. Each function is called, and returns the address of the next opcode -to be called by the core. The slow core performs bounds checking to ensure that the next -opcode to be called is properly in bounds. Because of this modular approach where opcodes -are treated as separate executable entities many other runcores, especially diagnostic and -maintenance cores are based on this design. +The slow core is a basic runcore design that treats each opcode as a +separate function at the C level. Each function is called, and returns +the address of the next opcode to be called by the core. The slow core +performs bounds checking to ensure that the next opcode to be called is +properly in bounds, and not somewhere random in memory. Because of this +modular approach where opcodes are treated as separate executable +entities many other runcores, especially diagnostic and maintenance +cores are based on this design. =item* Fast Core -The fast core is a bare-bones core that doesn't do any of the bounds-checking or context -updating that the slow core does. +The fast core is a bare-bones core that doesn't do any of the +bounds-checking or context updating that the slow core does. The fast +core is the way Parrot should run, and is used to find and debug places +where execution strays outside of it's normal bounds. =item* Computed Goto Core -I<Computed Goto> is a feature of some C compilers where a label is treated as a piece of -data that can be stored in an array. Each opcode is simply a label in a very large -function, and the labels are stored in an array. Calling an opcode is as easy as taking -that opcode's number as the index of the label array, and calling the associated label. -Sound complicated? It is a little, especially to C programmers who are not used to these -kinds of features, and who have been taught that the C<goto> keyword is to be avoided. +I<Computed Goto> is a feature of some C compilers where a label is +treated as a piece of data that can be stored in an array. Each opcode +is simply a label in a very large function, and the labels are stored +in an array. Calling an opcode is as easy as taking that opcode's number +as the index of the label array, and calling the associated label. +Sound complicated? It is a little, especially to C programmers who are +not used to these kinds of features, and who have been taught that the +C<goto> keyword is to be avoided. -As was mentioned earlier, not all compilers support computed goto, which means that this -core will not be built on platforms that don't support it. +As was mentioned earlier, not all compilers support computed goto, which +means that this core will not be built on platforms that don't support it. =item* Precomputed Goto Core -Thought the Computed Goto core was hard enough to understand? Precomputed goto takes the -concept a little further. +Thought the Computed Goto core was hard enough to understand? Precomputed +goto takes the concept a little further. =item* Tracing Core =item* Profiling Core +The profiling core analyzes the performance of Parrot, and helps to +determine where bottlenecks and trouble spots are in the programs that +run on top of Parrot. + =item* GC Debug Core +Parrot's garbage collector has been known as a weakness in the system +for several years. In fact, the garbage collector and memory management +subsystem was one of the last systems to be improved and rewritten before +the release of version 1.0. It's not that garbage collection isn't +important, but instead that it was so hard to do earlier in the project. + +Early on when the GC was such a weakness, and later when the GC was under +active development, it was useful to have an operational mode that would +really exercise the GC and find bugs that otherwise could hide by sheer +chance. The GC debug runcore was this tool. The core executes a complete +collection iteration between every single opcode. The throughput +performance is terrible, but that's not the point: it's almost guaranteed +to find problems in the memory system if they exist. + =item* Debug Core +The debug core works like a normal software debugger, such as GDB. The +debug core executes each opcode, and then prompts the user to enter a +command. These commands can be used to continue execution, step to the +next opcode, or examine and manipulate data from the executing program. + =back =head2 Opcodes -Opcodes are the smallest logical execution element in Parrot. An individual opcode -corresponds, in an abstract kind of way, with a single machine code instruction -for a particular hardware processor architecture. The difference is that Parrot's -opcodes can perform some very complex tasks. Also, Parrot's opcodes can be dynamically -loaded in from a special library file called a I<dynop library>. We'll talk about -dynops a little bit later +Opcodes are the smallest logical execution element in Parrot. An +individual opcode corresponds, in an abstract kind of way, with a single +machine code instruction for a particular hardware processor +architecture. The difference is that Parrot's opcodes can perform some +very complex and high-level tasks. Also, Parrot's opcodes can be +dynamically loaded in from a special library file called a I<dynop +library>. We'll talk about dynops a little bit later. =head3 Opcode naming +To the PIR and PASM programmers, opcodes appear to be polymorphic. That +is, some opcodes appear to have multiple argument formats. This is just an +illusion, however. Parrot opcodes are not polymorphic, although certain +features enable it to appear that way. Different argument list formats +are detected during parsing and translated into separate, and unique, +opcode names. + =head3 Opcode Multiple Dispatch =head2 Writing Opcodes -Writing Opcodes, like writing PMCs, is done in a C-like language which is later -compiled into C code by the X<opcode compiler> opcode compiler. The opcode script -represents a thin overlay on top of ordinary C code: All valid C code is valid -Opcode script. There are a few neat additions that make writing Opcodes easier. +Writing Opcodes, like writing PMCs, is done in a C-like language which is +later compiled into C code by the X<opcode compiler> opcode compiler. The +opcode script represents a thin overlay on top of ordinary C code: All +valid C code is valid opcode script. There are a few neat additions that +make writing opcodes easier. This script is very similar to that used to +define PMCs. The C<INTERP> constant, for instance, is always available +in the opcodes like they are in VTABLE and METHOD declarations. Unlike +VTABLEs and METHODs, opcodes are defined with the C<op> keyword. + +Opcodes are written in files with the C<.ops> extension. The core +operation files are stored in the C<src/ops/> directory. =head3 Opcode Parameters +Each opcode can take any fixed number of input and output arguments. These +arguments can be any of the four primary data types--INTVALs, PMCs, NUMBERS +and STRINGs--but can also be one of several other types of values including +LABELs, KEYs and INTKEYs. + +Each parameter can be an input, an output or both, using the C<in>, C<out>, +and C<inout> keywords respectively. Here is an example: + + op Foo (out INT, in NUM) + +This opcode could be called like this: + + $I0 = Foo $N0 # in PIR syntax + Foo $I0, $N0 # in PASM syntax + +When Parrot parses through the file and sees the C<Foo> operation, it +converts it to the real name C<Foo_i_n>. The real name of an opcode +is it's name followed by an underscore-separated ordered list of +the parameters to that opcode. This is how Parrot appears to use +polymorphism: It translates the overloaded opcode common names into +longer unique names depending on the parameter list of that opcode. Here +is a list of some of the variants of the C<add> opcode: + + add_i_i # $I0 += $I1 + add_n_n # $N0 += $N1 + add_p_p # $P0 += $P1 + add_i_i_i # $I0 = $I1 + $I2 + add_p_p_i # $P0 = $P1 + $I0 + add_p_p_n # $P0 = $P1 + $N0 + +This isn't a complete list, but you should get the picture. Each different +combination of parameters translates to a different unique operation, and +each operation is remarkably simple to implement. In some cases, Parrot +can even use it's multi-method dispatch system to call opcodes which are +heavily overloaded, or for which there is no exact fit but the parameters +could be coerced into different types to complete the operation. For +instance, attempting to add a STRING to a PMC might coerce the string into +a numerical type first, and then dispatch to the C<add_p_p_n> opcode. This +is just an example, and the exact mechanisms may change as more opcodes +are added or old ones are deleted. + =head3 Opcode Control Flow +Some opcodes have the ability to alter control flow of the program they +are in. There are a number of control behaviors that can be implemented, +such as an unconditional jump in the C<goto> opcode, or a subroutine +call in the C<call> code, or the conditional behavior implemented by C<if>. + +At the end of each opcode you can call a C<goto> operation to jump to the +next opcode to execute. If no C<goto> is performed, control flow will +continue like normal to the next operation in the program. In this way, +opcodes can easily manipulate control flow. Opcode script provides a +number of keywords to alter control flow: + +=over 4 + +=item * NEXT() + +If C<NEXT> contains the address of the next opcode in memory. You don't +need to call C<goto NEXT()>, however, because the default behavior for +all opcodes is to automatically jump to the next opcode in the program +N<You can do this if you really want to, but it really wouldn't help you +any>. The C<NEXT> keyword is frequently used in places like the C<invoke> +opcode to create a continuation to the next opcode to return to after +the subroutine returns. + +=back + =head2 The Opcode Compiler =head2 Dynops
