Re: Ranges and/versus iterators

Fawzi Mohamed Wed, 24 Mar 2010 16:10:11 -0700


On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:

On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:
The Phobos file I/O functions all avoid doing any more bufferingthan
the backing FILE* does. They achieve performance by locking the file
once with flockfile/funlockfile and then using fgetc_unlocked().

This puts me in real trouble with the formatted reading functions (a
la fscanf but generalized to all input ranges), which I'm gestating
about. The problem with the current API is that if you call
input.front(), it will call fgetc(). But then say I decide I'm done
with the range, as is the case with e.g. reading an integer and
stopping at the first non-digit. That non-digit character will be
lost. So there's a need to say, hey, put this guy back becausewhoeverreads after me will need to look at it. So I need a putBackFront()or
something (which would call fungetc()). I wish things were simpler.
I had a pushBack (I called that unget) in
http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d ,but Irecently removed that in favor of a peek function that I think ismuch
more flexible.
Thanks for sharing your design with me. Yes, peek() is more flexiblethan get/unget, but I'm under the stdio tyranny.
In fact I just realized something - I could call

setvbuf(_handle, null, _IONBF, 0)
whenever I bind a File to a FILE*. That way File can do its ownbuffering and can implement peek() etc. I wonder if we need to worryabout sharing, because e.g. several threads would want to write tostdout.

well for stdout by default I use locking to ensure writing writeschunks atomically.I would say that by default streams imply sequence, so can safely benon threadsafe.stdout, stderr and logging are exceptions, there at least chunksshould be written atomically.

What I did is to base most parsing on CharReaders (for example thechar
based ones from BasicIO):
{{{
/// extent of a slice of a buffer
enum SliceExtent{ Partial, Maximal, ToEnd }

/// a delegate that reads in from a character source
alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
CharReader;

/// a handler of CharReader, returns true if something was read
alias bool delegate(CharReader)CharReaderHandler;
}}}

a char reader reads from the given buffer buf, and can either request
more (by returning EOF), or eat some characters out of it. If it sets
iterate to true it wants to iterate with the eaten buffer (usefulto for
example skip undefined amount of whitespace that might overflow the
buffer).
Once you have that you can easily create a Peeker structure thatwraps aCharReader, and exposes a CharReaded that tries to match it, butalways
eats 0 characters, even if the match was successful.
With it you can have a peek method that returns true if theCharReaderthat you pass in matches, false if it does not match, and what youwant
if the buffer is too small to resolve the issue.

Most of these things are templates that work for any type T.
Wait, if you called it CharReader, how come it works with any typeT? Or are you referring to T as the parsed type?

Well I presented the CharReader for simplicity, and that is indeedonly for chars, but most things can be generalized, and indeed if youlook at

the Reader(T) interface in http://github.com/fawzi/blip/blob/master/blip/io/BasicIO.dor blip.text.TextParser or similar they are templated with a generictype T.For TextParser I was thinking T=char,wchar or dchar, whereas otherscases are even more generic.

Then I
built buffered types that using a size_t delegate(T[]) give a Reader
based interface.
All this is not based on single elements anymore, but on arrays(ranges?
:), but I think that is what is needed for efficient i/o.
Sounds good, but I wonder why you use delegates instead of classes.Is that for simplicity?


there are both, and both have their place.

delegates are very simple and can be easily built on the fly, I likethat very much, they reduce the code footprint of various things.More complex behaviour is better captured by classes, and indeed thereare (also in BasicIO) the following interfaces:


interface OutStreamI{
    void rawWriteStr(char[]);
    void rawWriteStr(wchar[]);
    void rawWriteStr(dchar[]);
    void rawWrite(void[]);
    CharSink charSink();
    BinSink binSink();
    void flush();
    void close();
}

/// a reader of elements of type T
interface Reader(T){
    /// read some data into the given buffer
    size_t readSome(T[]);
    /// character reader handler

bool handleReader(size_t delegate(T[], SliceExtent slice,out booliterate) r);

    /// shutdown the input source
    void shutdownInput();
}

/// one or more readers
interface MultiReader{
    enum Mode{ Binary=1, Char=2, Wchar=4, Dchar=8 }
    /// returns the modes this reader supports
    uint modes();
    /// returns the native modes of this reader (less overhead)
    uint nativeModes();
    Reader!(char) readerChar();
    Reader!(wchar) readerWchar();
    Reader!(dchar) readerDchar();
    Reader!(void) readerBin();
    void shutdownInput();
}

there are classes that can create the more full fledged objects out ofdelegates.

I confess it's not 100% clear to me how the delegates are supposedto be used in concert, particularly why there's a need for bothCharReader and CharReaderHandler.


mainly one needs CharReader, which is a method that reads something.

CharReaderHandler is there just for completeness, it is a delegate ofa method that actually reads, but normally one simply uses thatmethod, i.e. it uses a Reader!(T).handleReader method...

On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
I don't think we should give up on trying to make a stream rangethatis not awkward, I really dislike the way today's input ranges mapto
streams.
Me too. Let's keep on looking, I have the feeling something good is
right behind the corner. But then I felt that way for a year :o).
give a try to
bool popFront(ref T) ( or next, or another name, or even just adelegate
with that signature)
I was surprised how well it works, not perfect but better than theother
alternatives I had tried.

loop on a T[] array:
bool popFront(ref T* el);
So arrays have a different interface than streams. It looks like youcan't write code that works uniformly for both, because for some youneed the * and for some you don't. Did I understand that correctly?

well the foreach loop is the same, but the iteration loop is indeeddifferent in the sense that one uses a pointer to an element and theother the element itself.one can write code that removes the pointer that is there(dereferencing it, or doing and inline function with subsequent callwhich allows you to reuse the same variable name):

void myF(ref x){
 // code
}
myF(*x);

(that is a nice trick that I used several times).

But yes there *is* a difference and the difference is that with arraysyou might modify the element, modifying the stored value, whereas withstreams you can't.

This conceptual difference and if reflected in the interface.

One can then discuss if immutable arrays should be iterated withimmutable pointers or with values (i.e. copying) just as streams are.


Andrei

Re: Ranges and/versus iterators

Reply via email to