Re: std.stdio overhaul by Steve Schveighoffer

Steven Schveighoffer Sat, 03 Sep 2011 19:00:43 -0700

On Sat, 03 Sep 2011 15:54:05 -0400, Andrei Alexandrescu<[email protected]> wrote:

Hello,
There are a number of issues related to D's current handling of streams,including the existence of the imperfect etc.stream and theover-specialization of std.stdio.
Steve has worked on an extensive overhaul of std.stdio which wouldobviate the need for etc.stream and would improve both the generalityand efficiency of std.stdio.
Please chime in with feedback; he's away from the Usenet but allowed meto post this on his behalf. I uploaded the docs to
http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html

Thank you Andrei for posting this. Before I add some more details, let mefirst say, this is a very early version, but it does work (and spanks thepants off of the current stdio in the tests I've run).


I'll add several very important things:

1. At the moment, this is written for Linux *ONLY*. I have very goodexperience with Windows i/o, and I am 100% certain I can implement thislibrary for it. However, it's not my main OS, so I wanted to first getsomething working with my main working environment.2. This is *not* currently multithread aware. But it will be. However, Ithink one important aspect to consider is to make a *thread-local* awarei/o library to avoid unnecessary locking when an i/o connection is onlyused in one thread. But please leave that part alone for now, I'm workingon how to make the code reusable as shared types. Actually, if anyone hasgood ideas on that, please share!3. Although I am dead-set on getting *something* into Phobos, I am notattached at all to the symbol names, or even some major design choices. Ihave seen so far it's one of the major concerns, and I think we can findgood names. The names I came up with are not exactly arbitrary, but theyare somewhat based on earlier designs that I have since abandoned, sorenaming is definitely in order.4. You can get the full source here:https://github.com/schveiguy/phobos/tree/new-io I used the 2.054 stockcompiler, and a version of druntime that includes Lars' new std-processchanges, also on my github account:https://github.com/schveiguy/druntime/tree/new-std-process Please usethose when trying out the code.


--------------------------

So let me tell you about the library design and why I did it the way I didit. Then, I'll respond to individual concerns already posted.

The major problem I think the current std.stdio has is, it's bufferedsolution is based on C's FILE * implementation. Specifically, we havevery little control and access to the buffer implementation. I think thekey (or at least one of the keys) to uber-fast I/O is trying to copy aslittle as possible *needlessly*. Seamless and safe buffer access I thinkis the key to this. In addition to that, C's FILE * has severallimitations:

1. On Windows, it's based on DMC's runtime, which limits 60 simultaneousopen files (Windows OS limit is 10,000 I think)

2. 64-bit support is not standard in all C implementations (namely Windows)

3. All FILE * objects are inherently shared, meaning lock-free I/O is verycumbersome, especially considering we have D's shared/unshared system.4. C supports UTF-8, and it's supposed to support UTF-16 (but I can't getUTF-16 to work). I think D ought to support all forms of UTF, since UTFis an integral part of the language.

In addition to this, we have numerous D tools at our disposal --delegates, closures, ranges, etc. In other words, limiting us to C'sinterfaces means either duct-taping on those features, or abandoningthem. While a noble effort, and probably the best we could get, a primeexample is the LockingFileReader range in std.stdio. Just reading it mademe cringe. Have a look:https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1282


I felt, we must be able to do something better.

So I started creating what I thought would be a good i/o library. I didnot start from the existing code, but just rewrote everything. The basicconcept is, we implement buffering once, and implement low-level devicesthat can be wrapped by the buffering implementation. Almost everythingthat would use I/O wants to use a buffered version of it, so make thelow-level aggregate minimal, and put all the useful functionality into thebuffer. I also wanted to make sure it is very easy to implement*efficient* ranges.

One design decision early on is that the device-level should be a class.There are a few good reasons for this:

1. an I/O device is a reference-type. Copying it does not open anotherhandle. So even if we *wanted* structs, they would be pImpl structs.2. One simple idea that works very well at the OS level is the filedescriptor concept. The file descriptor provides an *interface* to usercode for operating on a stream. And they are easily inter-changeable.This means a fd could be a network socket, a file, a pipe, a COM port, andthe basic interface never changes. So we should use that same concept --define a simple interface for a low-level device, and then you canimplement the buffer around that interface. Since classes are the onlytypes which support interfaces, I chose them.

Yes, I know classes suffer from the dreaded "I don't know when the GC isgoing to get around to closing this file" problem. I think though, wehave ways to mediate that (I'll post some responses to points about thatelsewhere in the thread).

One other important design decision I made was that the standard handles*must* be changable at runtime to C-based i/o. This was mainly to appeaseWalter, as he insists on having compatible I/O with C functions (such asprintf). I think he has a good point, but I think limiting this tobasically the standard handles is the right level of compatibility.

After going through many iterations (you can look at the github history ifyou are interested), I settled on this basic tree. Note that I'm veryopen to changing any parts of this, as long as the basic concept of acommon buffer type surrounding a low-level device type is kept intact.


interface Seekable => an interface defining seek functions for a device.

interface InputStream : Seekable => an interface defining functions thatcan be called on an input device. This is non-buffered.interface OutputStream : Seekable => an interface defining functions thatcan be called on an output device. Also non-buffered.

class File : InputStream, OutputStream => The implementation for the OShandle-based input output stream. This is akin to a file descriptor.(Note, I realize this is a poor name choice for this, it should probablybe changed).

final class DInput => The buffered input stream. This implements thebuffer which surrounds an InputStream.final class DOutput => The buffered output stream. This implements thebuffer which surrounds an OutputStream.final class CStream => A Buffered Input and output stream based on C'sFILE *. This is used if you want to be compatible with C input or output,and is used in TextInput and TextOutput when using the C standard handles.

struct TextInput => A text-based input stream. This implements UTFtranslation of all forms and handles formatted input. Main memberfunction is readf.struct TextOutput => A text-based output stream. This implements UTFtranslation of all forms and handles formatted output. Main memberfunctions are the write* family.

It seems like a lot. But keep in mind that almost everyone will only everused DInput, DOutput, TextInput and TextOutput. These replace the currentstd.stdio.File. The low level devices are for implementing low-leveldevices. They are not really for being used, except to wrap in a buffer.I expect that convenience functions will exist to create the correctbuffered stream when given the right parameters. The most obvious exampleis the function openFile (which is included). The nice thing is, due tothe auto return feature and templates, this takes care of some of the messof having 4 main types to deal with.

I want to reiterate, I have created something that works, not somethingthat is perfect. I want everyone's input on how it should be changed --including major design decisions. I'm open to changing just abouteverything. The *only* major concept I want to keep is the bufferingsurrounding a low-level device.

Thanks for taking the time to look at this. I hope it will become goodenough to be included in Phobos. I plan to do everything I can to make ithappen.


-Steve

Re: std.stdio overhaul by Steve Schveighoffer

Reply via email to