On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu <[email protected]> wrote:

I've put together over the past days an embryonic streaming interface. It separates transport from formatting, input from output, and buffered from unbuffered operation.

http://erdani.com/d/phobos/std_stream2.html

There are a number of questions interspersed. It would be great to start a discussion using that design as a baseline. Please voice any related thoughts - thanks!

Without reading any other comments, here is my take on just the streaming part (not formatting).

Everything looks good except for two problems:

1. BufferedX should not inherit UnbufferedX. The main reason for this is because both Buffered *and* Unbuffered can be desirable properties. For example, you may want to *require* that you have a raw stream as a parameter without a buffer. The perfect example is a class which wraps an Unbuffered stream, and adds a buffer to it (which is what I'd expect as a class design). You don't want to accept a stream that's already buffered, or you are double-buffering. You can deal with this at runtime by throwing an exception, but I think it's better to disallow this to even compile.

Now, this removes the possibility of having a function which accepts either an unbuffered or buffered stream. I stipulate that this is not a valid requirement -- your code will work best with one of them, but not both. If you really need to accept either, you can use templates, but I think you will find you always use one or the other even there.

2. I think it's a mistake to put a range interface directly in the interface. A range can be built with the buffered stream as its core if need be. I have long voiced my opinion that I/O should not implement ranges, and reference types should never be ranges. For example, you are going to implement byLine based not on the range interface, but based on the other parts. Why must byLine be an external range, but "byBuffer" is builtin to the stream? In particular, I think popFront is an odd function for all buffered streams to have to implement.

To voice my opinions on the questions:


-----
Question: Should we offer an open primitive at this level? If so, what parameter(s) should it take?

No, if you need a new stream, create a new instance. The OS processing required to open a file is going to dwarf any performance degradation of creating a new class on the heap. For types that may open quick (say, an Array input stream), you can provide a function to re-open another array that doesn't have to go in the base interface.

Also note that opening a network stream requires quite different parameters than opening a file. Putting it at the interface level would require some sort of parsed-string parameter, which puts undue responsibility on such a basic interface.

-----
Question: Should we offer a primitive rewind that takes the stream back to the beginning? That might be supported even by some streams that don't support general seek calls. Alternatively, some streams might support seek(0, SeekAnchor.start) but not other calls to seek.

Considering that seek is already callable, even if the stream doesn't support it (because the interface defines it), I don't think it's unreasonable to selectively throw exceptions if the seek isn't possible. In otherwords, I think seek(0) is acceptable as an alternative to rewind().

However, you may also implement:

final void rewind() { seek(0);}

directly in the interface if necessary

-----
Question: May we eliminate seekFromCurrent and seekFromEnd and just have seek with absolute positioning? I don't know of streams that allow seek without allowing tell. Even if some stream doesn't, it's easy to add support for tell in a wrapper. The marginal cost of calling tell is small enough compared to the cost of seek.

I don't think the cost of tell is marginal. Support what the OS supports, and all OSes support seeking from the current position, reducing the number of system calls is preferable.

Also, how to implement seekFromEnd with just tell?

-----
Question: Should this throw on an unopened stream? I don't think so, because throwing does not offer any additional information that user code didn't have, and the idiom if (s.isOpen) s.close() is verbose and frequently encountered.

I agree, don't throw on an unopened stream.

-----
Question: Should we allow read to return an empty slice even if atEnd is false? If we do, we allow non-blocking streams with burst transfer. However, naive client code on non-blocking streams will be inefficient because it would essentially implement busy-waiting.

Why not return an integer so different situations could be designated? It's how the system call read works so you can tell no data was read but that's because it's a non-blocking stream.

I realize it's sexy to return the data again so it can be used immediately, but in practice it's more useful to return an integer.

For example, if you want to fill a buffer, you need a loop anyways (there's no guarantee that the first read will fill the buffer), and at that point, you are just going to use the length member of the return value to advance your loop.

I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF, positive on data read, and throw an exception on error.

-----
Question: Should we allow an empty front on a non-empty stream? This goes back to handling non-blocking streams.

Well, streams shouldn't have a range interface anyways, but to answer this specific question, I'd say no. front should fill the buffer if it's empty. This follows the nature of all other ranges, where front is available on creation.

-----
Question: Should we eliminate this function? Theoretically calling advance(n) is equivalent with seekFromCurrent(n). However, in practice a file-based stream will have to implement advance even though the underlying file is not seekable.

I think it's good to have this function. At first, I didn't, but now I realize it's good because advance(n) may be low-performance (it may use read to advance the stream). If you eliminate this function, but put it's functionality into seekFromCurrent, this makes seekFromCurrent low performance.

I think you should change the requirements, however, and follow the same return type as I specified above for read (-1 for wouldblock, 0 for EOF, positive for number of bytes 'advanced'). Otherwise, you have issues with non-blocking streams.

====================

OK, so now I've voiced my opinions on what's there, now I'll push the interface I had specified some time ago (which incidentally, I am building an I/O library based off of it). From my current skeleton:


    /**
     * Read data until a condition is satisfied.
     *
* Buffers data from the input stream until the delegate returns other than * ~0. The delegate is passed the data read so far, and the start of the * data just read. The deleate should return ~0 if the condition is not
     * satisfied, or the number of bytes that should be returned otherwise.
     *
* Any data that satisfies the condition will be considered consumed from
     * the stream.
     *
* params: process = A delegate to determine satisfaction of a condition
     * per the terms above.
     *
     * returns: the data identified by the delegate that satisfies the
     * condition.  Note that this data may be owned by the buffer and so
     * shouldn't be written to or stored for later use without duping.
     */
    ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);

The advantage of such an interface is that it creates a very efficient way to specify how to buffer the data based on the data (i.e. byLine comes to mind).

Here is a second function that does the same as above but appends it directly into a user-supplied buffer:

size_t appendUntil(uint delegate(ubyte[] data, uint start) process, ref ubyte[] arr);

-Steve

Reply via email to