Re: streaming redux

Steven Schveighoffer Wed, 29 Dec 2010 07:55:26 -0800

On Tue, 28 Dec 2010 02:02:29 -0500, Andrei Alexandrescu<[email protected]> wrote:

I've put together over the past days an embryonic streaming interface.It separates transport from formatting, input from output, and bufferedfrom unbuffered operation.
http://erdani.com/d/phobos/std_stream2.html
There are a number of questions interspersed. It would be great to starta discussion using that design as a baseline. Please voice any relatedthoughts - thanks!

Without reading any other comments, here is my take on just the streamingpart (not formatting).


Everything looks good except for two problems:

1. BufferedX should not inherit UnbufferedX. The main reason for this isbecause both Buffered *and* Unbuffered can be desirable properties. Forexample, you may want to *require* that you have a raw stream as aparameter without a buffer. The perfect example is a class which wraps anUnbuffered stream, and adds a buffer to it (which is what I'd expect as aclass design). You don't want to accept a stream that's already buffered,or you are double-buffering. You can deal with this at runtime bythrowing an exception, but I think it's better to disallow this to evencompile.

Now, this removes the possibility of having a function which acceptseither an unbuffered or buffered stream. I stipulate that this is not avalid requirement -- your code will work best with one of them, but notboth. If you really need to accept either, you can use templates, but Ithink you will find you always use one or the other even there.

2. I think it's a mistake to put a range interface directly in theinterface. A range can be built with the buffered stream as its core ifneed be. I have long voiced my opinion that I/O should not implementranges, and reference types should never be ranges. For example, you aregoing to implement byLine based not on the range interface, but based onthe other parts. Why must byLine be an external range, but "byBuffer" isbuiltin to the stream? In particular, I think popFront is an odd functionfor all buffered streams to have to implement.


To voice my opinions on the questions:


-----

Question: Should we offer an open primitive at this level? If so, whatparameter(s) should it take?

No, if you need a new stream, create a new instance. The OS processingrequired to open a file is going to dwarf any performance degradation ofcreating a new class on the heap.For types that may open quick (say, an Array input stream), you canprovide a function to re-open another array that doesn't have to go in thebase interface.

Also note that opening a network stream requires quite differentparameters than opening a file. Putting it at the interface level wouldrequire some sort of parsed-string parameter, which puts undueresponsibility on such a basic interface.


-----

Question: Should we offer a primitive rewind that takes the stream back tothe beginning? That might be supported even by some streams that don'tsupport general seek calls. Alternatively, some streams might supportseek(0, SeekAnchor.start) but not other calls to seek.

Considering that seek is already callable, even if the stream doesn'tsupport it (because the interface defines it), I don't think it'sunreasonable to selectively throw exceptions if the seek isn't possible.In otherwords, I think seek(0) is acceptable as an alternative to rewind().


However, you may also implement:

final void rewind() { seek(0);}

directly in the interface if necessary

-----

Question: May we eliminate seekFromCurrent and seekFromEnd and just haveseek with absolute positioning? I don't know of streams that allow seekwithout allowing tell. Even if some stream doesn't, it's easy to addsupport for tell in a wrapper. The marginal cost of calling tell is smallenough compared to the cost of seek.

I don't think the cost of tell is marginal. Support what the OS supports,and all OSes support seeking from the current position, reducing thenumber of system calls is preferable.


Also, how to implement seekFromEnd with just tell?

-----

Question: Should this throw on an unopened stream? I don't think so,because throwing does not offer any additional information that user codedidn't have, and the idiom if (s.isOpen) s.close() is verbose andfrequently encountered.


I agree, don't throw on an unopened stream.

-----

Question: Should we allow read to return an empty slice even if atEnd isfalse? If we do, we allow non-blocking streams with burst transfer.However, naive client code on non-blocking streams will be inefficientbecause it would essentially implement busy-waiting.

Why not return an integer so different situations could be designated?It's how the system call read works so you can tell no data was read butthat's because it's a non-blocking stream.

I realize it's sexy to return the data again so it can be usedimmediately, but in practice it's more useful to return an integer.

For example, if you want to fill a buffer, you need a loop anyways(there's no guarantee that the first read will fill the buffer), and atthat point, you are just going to use the length member of the returnvalue to advance your loop.

I'd say, return -1 if a non-blocking stream returns no data, 0 on EOF,positive on data read, and throw an exception on error.


-----

Question: Should we allow an empty front on a non-empty stream? This goesback to handling non-blocking streams.

Well, streams shouldn't have a range interface anyways, but to answer thisspecific question, I'd say no. front should fill the buffer if it'sempty. This follows the nature of all other ranges, where front isavailable on creation.


-----

Question: Should we eliminate this function? Theoretically callingadvance(n) is equivalent with seekFromCurrent(n). However, in practice afile-based stream will have to implement advance even though theunderlying file is not seekable.

I think it's good to have this function. At first, I didn't, but now Irealize it's good because advance(n) may be low-performance (it may useread to advance the stream). If you eliminate this function, but put it'sfunctionality into seekFromCurrent, this makes seekFromCurrent lowperformance.

I think you should change the requirements, however, and follow the samereturn type as I specified above for read (-1 for wouldblock, 0 for EOF,positive for number of bytes 'advanced'). Otherwise, you have issues withnon-blocking streams.


====================

OK, so now I've voiced my opinions on what's there, now I'll push theinterface I had specified some time ago (which incidentally, I am buildingan I/O library based off of it). From my current skeleton:



    /**
     * Read data until a condition is satisfied.
     *

* Buffers data from the input stream until the delegate returns otherthan* ~0. The delegate is passed the data read so far, and the start ofthe* data just read. The deleate should return ~0 if the condition isnot

     * satisfied, or the number of bytes that should be returned otherwise.
     *

* Any data that satisfies the condition will be considered consumedfrom

     * the stream.
     *

* params: process = A delegate to determine satisfaction of acondition

     * per the terms above.
     *
     * returns: the data identified by the delegate that satisfies the
     * condition.  Note that this data may be owned by the buffer and so
     * shouldn't be written to or stored for later use without duping.
     */
    ubyte[] readUntil(uint delegate(ubyte[] data, uint start) process);

The advantage of such an interface is that it creates a very efficient wayto specify how to buffer the data based on the data (i.e. byLine comes tomind).

Here is a second function that does the same as above but appends itdirectly into a user-supplied buffer:

size_t appendUntil(uint delegate(ubyte[] data, uint start) process,ref ubyte[] arr);


-Steve

Re: streaming redux

Reply via email to