Author: allison Date: Mon Mar 6 14:43:44 2006 New Revision: 11805 Added: trunk/docs/pdds/clip/pddXX_io.pod Modified: trunk/ (props changed) trunk/MANIFEST
Log: Committing the draft I/O PDD to the "clip" directory as a work-in-progress, so we can easily track changes. Modified: trunk/MANIFEST ============================================================================== --- trunk/MANIFEST (original) +++ trunk/MANIFEST Mon Mar 6 14:43:44 2006 @@ -342,6 +342,7 @@ docs/pdds/clip/pdd17_basic_types.pod [main]doc docs/pdds/clip/pdd18_security.pod [main]doc docs/pdds/clip/pdd19_pir.pod [main]doc +docs/pdds/clip/pddXX_io.pod [main]doc docs/pmc/array.pod [main]doc docs/pmc/iterator.pod [main]doc docs/pmc/perlarray.pod [main]doc Added: trunk/docs/pdds/clip/pddXX_io.pod ============================================================================== --- (empty file) +++ trunk/docs/pdds/clip/pddXX_io.pod Mon Mar 6 14:43:44 2006 @@ -0,0 +1,394 @@ +# Copyright: 2001-2006 The Perl Foundation. All Rights Reserved. +# $Id $ + +=head1 NAME + +docs/pdds/pddXX_io.pod - Parrot I/O + +=head1 ABSTRACT + +Parrot's I/O subsystem. + +=head1 VERSION + +$Revision $ + +=head1 SYNOPSIS + + open P0, "data.txt", ">" + print P0, "sample data\n" + close P0 + + open P1, "data.txt", "<" + S0 = read P1, 12 + P2 = getstderr + print P2, S0 + close P1 + + ... + +=head1 DEFINITIONS + +A "stream" allows input or output operations on a source/destination +such as a file, keyboard, or text console. Streams are also called +"filehandles", though only some of them have anything to do with files. + +=head1 DESCRIPTION + +This is a draft document defining Parrot's I/O subsystem, for both +streams and network I/O. + +=head2 I/O Stream Opcodes + +=head3 Opening and Closing Streams + +=over 4 + +=item * + +C<open> opens a stream object based on a string path. It takes an +optional string argument specifying the mode of the stream (read, write, +append, read/write, etc.) [Some discussion of the syntax of the format +strings may be relevant. Currently it uses Perl syntax, but a set of +defined constants may fit better with Parrot's general architecture.] + +=item * + +C<close> closes a stream object. + +=back + +=head3 Retrieving Existing Streams + +=over 4 + +=item * + +C<getstdin>, C<getstdout>, and C<getstderr> return a stream object for +standard input, standard output, and standard error. + +=item * + +C<fdopen> converts an existing and already open UNIX integer file +descriptor into a stream object. It also takes a string argument to +specify the mode. + +=back + +=head3 Writing to Streams + +=over 4 + +=item * + +C<print> writes an integer, float, string, or PMC value to a stream. It +writes to standard output by default, but optionally takes a PMC +argument to select another stream to write to. + +=item * + +C<write> also writes to standard output and cannot select another +stream. It only accepts a PMC value to write. [Is this redundant?] + +=item * + +C<printerr> writes an integer, float, string, or PMC value to standard +error. + +=back + +=head3 Reading From Streams + +=over 4 + +=item * + +C<read> retrieves a specified number of bytes from a stream into a +string. [Note this is bytes, not codepoints.] By default it reads from +standard input, but it also takes an alternate stream object source as +an optional argument. + +=item * + +C<readline> retrieves a single line from a stream into a string. Calling +C<readline> flags the stream as operating in line-buffer mode (see +C<pioctl> below). Lines are truncated at 64K. + +=item * + +C<peek> retrieves the next byte from a stream into a string, but doesn't +remove it from the stream. By default it reads from standard input, but +it also takes a stream object argument for an alternate source. + +=back + +=head3 Retrieving and Setting Stream Properties + +=over 4 + +=item * + +C<seek> sets the current file position of a stream object to an integer +byte offset from an integer starting position (0 for the start of the +file, 1 for the current position, and 2 for the end of the file). + +=item * + +C<tell> retrieves the current file position of a stream object. It also +has a 64-bit variant that returns the byte offset as two integers (one +for the first 32 bits of the 64-bit offset, and one for the second 32 +bits). + +=item * + +C<getfd> retrieves the UNIX integer file descriptor of a stream object, +or 0 if it doesn't have an integer file descriptor. [Maybe -1 would be a +better code for "undefined", since standard input is 0.] + +=item * + +C<pioctl> provides low-level access to the attributes of a stream +object. It takes a stream object, an integer flag to select a command, +and a single integer argument for the command. It returns an integer +indicating the success or failure of the command. + +The following constants are defined for the commands that C<pioctl> can +execute: + + 0 PIOCTL_CMDRESERVED + No documentation available. + 1 PIOCTL_CMDSETRECSEP + Set the record separator. [This doesn't actually work at the + moment.] + 2 PIOCTL_CMDGETRECSEP + Get the record separator. + 3 PIOCTL_CMDSETBUFTYPE + Set the buffer type. + 4 PIOCTL_CMDGETBUFTYPE + Get the buffer type + 5 PIOCTL_CMDSETBUFSIZE + Set the buffer size. + 6 PIOCTL_CMDGETBUFSIZE + Get the buffer size. + +The following constants are defined as argument/return values for the +buffer-type commands: + + 0 PIOCTL_NONBUF + Unbuffered I/O. Bytes are sent as soon as possible. + 1 PIOCTL_LINEBUF + Line buffered I/O. Bytes are sent when a newline is + encountered. + 2 PIOCTL_BLKBUF + Fully buffered I/O. Bytes are sent when the buffer is full. + [Called "BLKBUF" because bytes are sent as a block, but line + buffering also sends them as a block, so "FULBUF" might make + more sense.] + +=back + +=head2 File opcodes + +=over 4 + +=item * + +C<stat> retrieves information about a file on the filesystem. It takes a +string filename or an integer argument of a UNIX file descriptor, and an +integer flag for the type of information requested. It returns an +integer containing the requested information. The following constants +are defined for the type of information requested (see +F<runtime/parrot/include/stat.pasm>): + + 0 STAT_EXISTS + Whether the file exists. + 1 STAT_FILESIZE + The size of the file. + 2 STAT_ISDIR + Whether the file is a directory. + 3 STAT_ISDEV + Whether the file is a device such as a terminal or a disk. + 4 STAT_CREATETIME + The time the file was created. + (Currently just returns -1.) + 5 STAT_ACCESSTIME + The last time the file was accessed. + 6 STAT_MODIFYTIME + The last time the file data was changed. + 7 STAT_CHANGETIME + The last time the file metadata was changed. + 8 STAT_BACKUPTIME + The last time the file was backed up. + (Currently just returns -1.) + 9 STAT_UID + The user ID of the file. + 10 STAT_GID + The group ID of the file. + +=back + +=head2 Network I/O Opcodes + +Most of these opcodes conform to the standard UNIX interface, but the +layer API allows alternate implementations for each. + +[It's worth considering making all the network I/O opcodes use a +consistent way of marking errors. At the moment, all return an integer +status code except for C<socket>, C<sockaddr>, and C<accept>.] + +=over 4 + +=item * + +C<socket> returns a new socket object from a given address family, +socket type, and protocol number (all integers). The socket object's +boolean value can be tested for whether the socket was created. + +=item * + +C<sockaddr> returns a string representing a socket address, generated +from a port number (integer) and an address (string). + +=item * + +C<connect> connects a socket object to an address. It returns an integer +indicating the status of the call, -1 if unsuccessful. + +=item * + +C<recv> receives a message from a connected socket object into a string. +It returns an integer indicating the status of the call, -1 if +unsuccessful. + +=item * + +C<send> sends a message string to a connected socket object. It returns +an integer indicating the status of the call, -1 if unsuccessful. + +=item * + +C<poll> polls a socket object for particular types of events (an integer +flag) at a frequency set by seconds and microseconds (the final two +integer arguments). It returns an integer indicating the status of the +call, -1 if unsuccessful. [See the system documentation for C<poll> to +see the constants for event types and return status.] + +=item * + +C<bind> binds a socket object to the port and address specified by a +string address (the packed result of C<sockaddr>). It returns an integer +indicating the status of the call, -1 if unsuccessful. + +=item * + +C<listen> listens for a new connection on a socket object. The integer +argument gives the maximum size of the queue for pending connections. +It returns an integer indicating the status of the call, -1 if +unsuccessful. + +=item * + +C<accept> accepts a new connection on a given socket object, and returns +a newly created socket object for the connection. Returns NULL if +unsuccessful. + +=back + +=head1 IMPLEMENTATION + +The Parrot I/O subsystem uses a per-interpreter stack to provide a +layer-based approach to I/O. Each layer implements a subset of the +C<ParrotIOLayerAPI> vtable. To find an I/O function, the layer stack is +searched downwards until a non-NULL function pointer is found for +that particular slot. + +[Below is an excerpt from "Perl 6 and Parrot Essentials", included to +seed discussion. Note that while Parrot was originally specified as +having asynchronous I/O, all current opcodes are synchronous I/O.] + +Parrot's base I/O system is fully asynchronous I/O with callbacks and +per-request private data. Since this is massive overkill in many cases, +we have a plain vanilla synchronous I/O layer that your programs can use +if they don't need the extra power. + +Asynchronous I/O is conceptually pretty simple. Your program makes an +I/O request. The system takes that request and returns control to your +program, which keeps running. Meanwhile the system works on satisfying +the I/O request. When the request is satisfied, the system notifies +your program in some way. Since there can be multiple requests +outstanding, and you can't be sure exactly what your program will be +doing when a request is satisfied, programs that make use of +asynchronous I/O can be complex. + +Synchronous I/O is even simpler. Your program makes a request to the +system and then waits until that request is done. There can be only +one request in process at a time, and you always know what you're +doing (waiting) while the request is being processed. It makes your +program much simpler, since you don't have to do any sort of +coordination or synchronization. + +The big benefit of asynchronous I/O systems is that they generally +have a much higher throughput than a synchronous system. They move +data around much faster--in some cases three or four times faster. +This is because the system can be busy moving data to or from disk +while your program is busy processing data that it got from a previous +request. + +For disk devices, having multiple outstanding requests--especially on +a busy system--allows the system to order read and write requests to +take better advantage of the underlying hardware. For example, many +disk devices have built-in track buffers. No matter how small a +request you make to the drive, it always reads a full track. With +synchronous I/O, if your program makes two small requests to the same +track, and they're separated by a request for some other data, the +disk will have to read the full track twice. With asynchronous I/O, on +the other hand, the disk may be able to read the track just once, and +satisfy the second request from the track buffer. + +Parrot's I/O system revolves around a request. A request has three +parts: a buffer for data, a completion routine, and a piece of data +private to the request. Your program issues the request, then goes about +its business. When the request is completed, Parrot will call the +completion routine, passing it the request that just finished. The +completion routine extracts out the buffer and the private data, and +does whatever it needs to do to handle the request. If your request +doesn't have a completion routine, then your program will have to +explicitly check to see if the request was satisfied. + +Your program can choose to sleep and wait for the request to finish, +essentially blocking. Parrot will continue to process events while +your program is waiting, so it isn't completely unresponsive. This is +how Parrot implements synchronous I/O--it issues the asynchronous +request, then immediately waits for that request to complete. + +The reason we made Parrot's I/O system asynchronous by default was +sheer pragmatism. Network I/O is all asynchronous, as is GUI +programming, so we knew we had to deal with asynchrony in some form. +It's also far easier to make an asynchronous system pretend to be +synchronous than it is the other way around. We could have decided to +treat GUI events, network I/O, and file I/O all separately, but there +are plenty of systems around that demonstrate what a bad idea that is. + +=head1 ATTACHMENTS + +None. + +=head1 FOOTNOTES + +None. + +=head1 REFERENCES + + src/io/io.c + src/ops/io.ops + include/parrot/io.h + runtime/parrot/library/Stream/* + src/io/io_unix.c + src/io/io_win32.c + +=cut + +__END__ +Local Variables: + fill-column:78 +End:
