On 24-mar-10, at 23:29, Andrei Alexandrescu wrote:
On 03/24/2010 09:00 AM, Fawzi Mohamed wrote:
On 24-mar-10, at 03:51, Andrei Alexandrescu wrote:
The Phobos file I/O functions all avoid doing any more buffering
than
the backing FILE* does. They achieve performance by locking the file
once with flockfile/funlockfile and then using fgetc_unlocked().
This puts me in real trouble with the formatted reading functions (a
la fscanf but generalized to all input ranges), which I'm gestating
about. The problem with the current API is that if you call
input.front(), it will call fgetc(). But then say I decide I'm done
with the range, as is the case with e.g. reading an integer and
stopping at the first non-digit. That non-digit character will be
lost. So there's a need to say, hey, put this guy back because
whoever
reads after me will need to look at it. So I need a putBackFront()
or
something (which would call fungetc()). I wish things were simpler.
I had a pushBack (I called that unget) in
http://github.com/fawzi/blip/blob/master/blip/text/TextParser.d ,
but I
recently removed that in favor of a peek function that I think is
much
more flexible.
Thanks for sharing your design with me. Yes, peek() is more flexible
than get/unget, but I'm under the stdio tyranny.
In fact I just realized something - I could call
setvbuf(_handle, null, _IONBF, 0)
whenever I bind a File to a FILE*. That way File can do its own
buffering and can implement peek() etc. I wonder if we need to worry
about sharing, because e.g. several threads would want to write to
stdout.
well for stdout by default I use locking to ensure writing writes
chunks atomically.
I would say that by default streams imply sequence, so can safely be
non threadsafe.
stdout, stderr and logging are exceptions, there at least chunks
should be written atomically.
What I did is to base most parsing on CharReaders (for example the
char
based ones from BasicIO):
{{{
/// extent of a slice of a buffer
enum SliceExtent{ Partial, Maximal, ToEnd }
/// a delegate that reads in from a character source
alias size_t delegate(char[]buf, SliceExtent slice,out bool iterate)
CharReader;
/// a handler of CharReader, returns true if something was read
alias bool delegate(CharReader)CharReaderHandler;
}}}
a char reader reads from the given buffer buf, and can either request
more (by returning EOF), or eat some characters out of it. If it sets
iterate to true it wants to iterate with the eaten buffer (useful
to for
example skip undefined amount of whitespace that might overflow the
buffer).
Once you have that you can easily create a Peeker structure that
wraps a
CharReader, and exposes a CharReaded that tries to match it, but
always
eats 0 characters, even if the match was successful.
With it you can have a peek method that returns true if the
CharReader
that you pass in matches, false if it does not match, and what you
want
if the buffer is too small to resolve the issue.
Most of these things are templates that work for any type T.
Wait, if you called it CharReader, how come it works with any type
T? Or are you referring to T as the parsed type?
Well I presented the CharReader for simplicity, and that is indeed
only for chars, but most things can be generalized, and indeed if you
look at
the Reader(T) interface in http://github.com/fawzi/blip/blob/master/blip/io/BasicIO.d
or blip.text.TextParser or similar they are templated with a generic
type T.
For TextParser I was thinking T=char,wchar or dchar, whereas others
cases are even more generic.
Then I
built buffered types that using a size_t delegate(T[]) give a Reader
based interface.
All this is not based on single elements anymore, but on arrays
(ranges?
:), but I think that is what is needed for efficient i/o.
Sounds good, but I wonder why you use delegates instead of classes.
Is that for simplicity?
there are both, and both have their place.
delegates are very simple and can be easily built on the fly, I like
that very much, they reduce the code footprint of various things.
More complex behaviour is better captured by classes, and indeed there
are (also in BasicIO) the following interfaces:
interface OutStreamI{
void rawWriteStr(char[]);
void rawWriteStr(wchar[]);
void rawWriteStr(dchar[]);
void rawWrite(void[]);
CharSink charSink();
BinSink binSink();
void flush();
void close();
}
/// a reader of elements of type T
interface Reader(T){
/// read some data into the given buffer
size_t readSome(T[]);
/// character reader handler
bool handleReader(size_t delegate(T[], SliceExtent slice,out bool
iterate) r);
/// shutdown the input source
void shutdownInput();
}
/// one or more readers
interface MultiReader{
enum Mode{ Binary=1, Char=2, Wchar=4, Dchar=8 }
/// returns the modes this reader supports
uint modes();
/// returns the native modes of this reader (less overhead)
uint nativeModes();
Reader!(char) readerChar();
Reader!(wchar) readerWchar();
Reader!(dchar) readerDchar();
Reader!(void) readerBin();
void shutdownInput();
}
there are classes that can create the more full fledged objects out of
delegates.
I confess it's not 100% clear to me how the delegates are supposed
to be used in concert, particularly why there's a need for both
CharReader and CharReaderHandler.
mainly one needs CharReader, which is a method that reads something.
CharReaderHandler is there just for completeness, it is a delegate of
a method that actually reads, but normally one simply uses that
method, i.e. it uses a Reader!(T).handleReader method...
On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:
I don't think we should give up on trying to make a stream range
that
is not awkward, I really dislike the way today's input ranges map
to
streams.
Me too. Let's keep on looking, I have the feeling something good is
right behind the corner. But then I felt that way for a year :o).
give a try to
bool popFront(ref T) ( or next, or another name, or even just a
delegate
with that signature)
I was surprised how well it works, not perfect but better than the
other
alternatives I had tried.
loop on a T[] array:
bool popFront(ref T* el);
So arrays have a different interface than streams. It looks like you
can't write code that works uniformly for both, because for some you
need the * and for some you don't. Did I understand that correctly?
well the foreach loop is the same, but the iteration loop is indeed
different in the sense that one uses a pointer to an element and the
other the element itself.
one can write code that removes the pointer that is there
(dereferencing it, or doing and inline function with subsequent call
which allows you to reuse the same variable name):
void myF(ref x){
// code
}
myF(*x);
(that is a nice trick that I used several times).
But yes there *is* a difference and the difference is that with arrays
you might modify the element, modifying the stored value, whereas with
streams you can't.
This conceptual difference and if reflected in the interface.
One can then discuss if immutable arrays should be iterated with
immutable pointers or with values (i.e. copying) just as streams are.
Andrei