On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:

On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:

The Phobos file I/O functions all avoid doing any more buffering than
the backing FILE* does. They achieve performance by locking the file
once with flockfile/funlockfile and then using fgetc_unlocked().

This puts me in real trouble with the formatted reading functions (a
la fscanf but generalized to all input ranges), which I'm gestating
about. The problem with the current API is that if you call
input.front(), it will call fgetc(). But then say I decide I'm done
with the range, as is the case with e.g. reading an integer and
stopping at the first non-digit. That non-digit character will be
lost. So there's a need to say, hey, put this guy back because whoever
reads after me will need to look at it. So I need a putBackFront() or
something (which would call fungetc()). I wish things were simpler.

I had a pushBack (I called that unget) in
http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d , but I
recently removed that in favor of a peek function that I think is much
more flexible.

Thanks for sharing your design with me. Yes, peek() is more flexible than get/unget, but I'm under the stdio tyranny.

In fact I just realized something - I could call

setvbuf(_handle, null, _IONBF, 0)

whenever I bind a File to a FILE*. That way File can do its own buffering and can implement peek() etc. I wonder if we need to worry about sharing, because e.g. several threads would want to write to stdout.

What I did is to base most parsing on CharReaders (for example the char
based ones from BasicIO):
{{{
/// extent of a slice of a buffer
enum SliceExtent{ Partial, Maximal, ToEnd }

/// a delegate that reads in from a character source
alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
CharReader;

/// a handler of CharReader, returns true if something was read
alias bool delegate(CharReader)CharReaderHandler;
}}}

a char reader reads from the given buffer buf, and can either request
more (by returning EOF), or eat some characters out of it. If it sets
iterate to true it wants to iterate with the eaten buffer (useful to for
example skip undefined amount of whitespace that might overflow the
buffer).

Once you have that you can easily create a Peeker structure that wraps a
CharReader, and exposes a CharReaded that tries to match it, but always
eats 0 characters, even if the match was successful.
With it you can have a peek method that returns true if the CharReader
that you pass in matches, false if it does not match, and what you want
if the buffer is too small to resolve the issue.

Most of these things are templates that work for any type T.

Wait, if you called it CharReader, how come it works with any type T? Or are you referring to T as the parsed type?

Then I
built buffered types that using a size_t delegate(T[]) give a Reader
based interface.

All this is not based on single elements anymore, but on arrays (ranges?
:), but I think that is what is needed for efficient i/o.

Sounds good, but I wonder why you use delegates instead of classes. Is that for simplicity?

I confess it's not 100% clear to me how the delegates are supposed to be used in concert, particularly why there's a need for both CharReader and CharReaderHandler.

On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
I don't think we should give up on trying to make a stream range that
is not awkward, I really dislike the way today's input ranges map to
streams.

Me too. Let's keep on looking, I have the feeling something good is
right behind the corner. But then I felt that way for a year :o).

give a try to
bool popFront(ref T) ( or next, or another name, or even just a delegate
with that signature)
I was surprised how well it works, not perfect but better than the other
alternatives I had tried.

loop on a T[] array:
bool popFront(ref T* el);

So arrays have a different interface than streams. It looks like you can't write code that works uniformly for both, because for some you need the * and for some you don't. Did I understand that correctly?

Andrei

Reply via email to