On Sat, 03 Sep 2011 15:54:05 -0400, Andrei Alexandrescu
<[email protected]> wrote:
Hello,
There are a number of issues related to D's current handling of streams,
including the existence of the imperfect etc.stream and the
over-specialization of std.stdio.
Steve has worked on an extensive overhaul of std.stdio which would
obviate the need for etc.stream and would improve both the generality
and efficiency of std.stdio.
Please chime in with feedback; he's away from the Usenet but allowed me
to post this on his behalf. I uploaded the docs to
http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html
Thank you Andrei for posting this. Before I add some more details, let me
first say, this is a very early version, but it does work (and spanks the
pants off of the current stdio in the tests I've run).
I'll add several very important things:
1. At the moment, this is written for Linux *ONLY*. I have very good
experience with Windows i/o, and I am 100% certain I can implement this
library for it. However, it's not my main OS, so I wanted to first get
something working with my main working environment.
2. This is *not* currently multithread aware. But it will be. However, I
think one important aspect to consider is to make a *thread-local* aware
i/o library to avoid unnecessary locking when an i/o connection is only
used in one thread. But please leave that part alone for now, I'm working
on how to make the code reusable as shared types. Actually, if anyone has
good ideas on that, please share!
3. Although I am dead-set on getting *something* into Phobos, I am not
attached at all to the symbol names, or even some major design choices. I
have seen so far it's one of the major concerns, and I think we can find
good names. The names I came up with are not exactly arbitrary, but they
are somewhat based on earlier designs that I have since abandoned, so
renaming is definitely in order.
4. You can get the full source here:
https://github.com/schveiguy/phobos/tree/new-io I used the 2.054 stock
compiler, and a version of druntime that includes Lars' new std-process
changes, also on my github account:
https://github.com/schveiguy/druntime/tree/new-std-process Please use
those when trying out the code.
--------------------------
So let me tell you about the library design and why I did it the way I did
it. Then, I'll respond to individual concerns already posted.
The major problem I think the current std.stdio has is, it's buffered
solution is based on C's FILE * implementation. Specifically, we have
very little control and access to the buffer implementation. I think the
key (or at least one of the keys) to uber-fast I/O is trying to copy as
little as possible *needlessly*. Seamless and safe buffer access I think
is the key to this. In addition to that, C's FILE * has several
limitations:
1. On Windows, it's based on DMC's runtime, which limits 60 simultaneous
open files (Windows OS limit is 10,000 I think)
2. 64-bit support is not standard in all C implementations (namely Windows)
3. All FILE * objects are inherently shared, meaning lock-free I/O is very
cumbersome, especially considering we have D's shared/unshared system.
4. C supports UTF-8, and it's supposed to support UTF-16 (but I can't get
UTF-16 to work). I think D ought to support all forms of UTF, since UTF
is an integral part of the language.
In addition to this, we have numerous D tools at our disposal --
delegates, closures, ranges, etc. In other words, limiting us to C's
interfaces means either duct-taping on those features, or abandoning
them. While a noble effort, and probably the best we could get, a prime
example is the LockingFileReader range in std.stdio. Just reading it made
me cringe. Have a look:
https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1282
I felt, we must be able to do something better.
So I started creating what I thought would be a good i/o library. I did
not start from the existing code, but just rewrote everything. The basic
concept is, we implement buffering once, and implement low-level devices
that can be wrapped by the buffering implementation. Almost everything
that would use I/O wants to use a buffered version of it, so make the
low-level aggregate minimal, and put all the useful functionality into the
buffer. I also wanted to make sure it is very easy to implement
*efficient* ranges.
One design decision early on is that the device-level should be a class.
There are a few good reasons for this:
1. an I/O device is a reference-type. Copying it does not open another
handle. So even if we *wanted* structs, they would be pImpl structs.
2. One simple idea that works very well at the OS level is the file
descriptor concept. The file descriptor provides an *interface* to user
code for operating on a stream. And they are easily inter-changeable.
This means a fd could be a network socket, a file, a pipe, a COM port, and
the basic interface never changes. So we should use that same concept --
define a simple interface for a low-level device, and then you can
implement the buffer around that interface. Since classes are the only
types which support interfaces, I chose them.
Yes, I know classes suffer from the dreaded "I don't know when the GC is
going to get around to closing this file" problem. I think though, we
have ways to mediate that (I'll post some responses to points about that
elsewhere in the thread).
One other important design decision I made was that the standard handles
*must* be changable at runtime to C-based i/o. This was mainly to appease
Walter, as he insists on having compatible I/O with C functions (such as
printf). I think he has a good point, but I think limiting this to
basically the standard handles is the right level of compatibility.
After going through many iterations (you can look at the github history if
you are interested), I settled on this basic tree. Note that I'm very
open to changing any parts of this, as long as the basic concept of a
common buffer type surrounding a low-level device type is kept intact.
interface Seekable => an interface defining seek functions for a device.
interface InputStream : Seekable => an interface defining functions that
can be called on an input device. This is non-buffered.
interface OutputStream : Seekable => an interface defining functions that
can be called on an output device. Also non-buffered.
class File : InputStream, OutputStream => The implementation for the OS
handle-based input output stream. This is akin to a file descriptor.
(Note, I realize this is a poor name choice for this, it should probably
be changed).
final class DInput => The buffered input stream. This implements the
buffer which surrounds an InputStream.
final class DOutput => The buffered output stream. This implements the
buffer which surrounds an OutputStream.
final class CStream => A Buffered Input and output stream based on C's
FILE *. This is used if you want to be compatible with C input or output,
and is used in TextInput and TextOutput when using the C standard handles.
struct TextInput => A text-based input stream. This implements UTF
translation of all forms and handles formatted input. Main member
function is readf.
struct TextOutput => A text-based output stream. This implements UTF
translation of all forms and handles formatted output. Main member
functions are the write* family.
It seems like a lot. But keep in mind that almost everyone will only ever
used DInput, DOutput, TextInput and TextOutput. These replace the current
std.stdio.File. The low level devices are for implementing low-level
devices. They are not really for being used, except to wrap in a buffer.
I expect that convenience functions will exist to create the correct
buffered stream when given the right parameters. The most obvious example
is the function openFile (which is included). The nice thing is, due to
the auto return feature and templates, this takes care of some of the mess
of having 4 main types to deal with.
I want to reiterate, I have created something that works, not something
that is perfect. I want everyone's input on how it should be changed --
including major design decisions. I'm open to changing just about
everything. The *only* major concept I want to keep is the buffering
surrounding a low-level device.
Thanks for taking the time to look at this. I hope it will become good
enough to be included in Phobos. I plan to do everything I can to make it
happen.
-Steve