Re: [RFC] I/O and Buffer Range

Steven Schveighoffer Thu, 16 Jan 2014 12:06:38 -0800

On Thu, 16 Jan 2014 13:44:08 -0500, Dmitry Olshansky<[email protected]> wrote:

16-Jan-2014 19:55, Steven Schveighoffer пишет:

On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
<[email protected]> wrote:

Then our goals are aligned. Be sure to take a peek at (if you haven't
already):
https://github.com/schveiguy/phobos/blob/new-io/std/io.d


Yes, I'm gearing up to revisit that after a long D hiatus, and I came
across this thread.

At this point, I really really like the ideas that you have in this. It
solves an issue that I struggled with, and my solution was quite clunky.

I am thinking of this layout for streams/buffers:

1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
I have pretty much written)
2. Buffer like you have, based on a struct, with specific primitives.
It's job is to collect data from the underlying stream, and present it
to consumers as a random-access buffer.

The only interesting thing I'd add here s that some buffer may workwithout underlying stream. Best examples are arrays and MM-files.

Yes, but I would stress that for convenience, the buffer should forwardsome of the stream primitives (such as seeking) in cases where seeking ispossible, at least in the case of a buffer that wraps a stream.

That actually is another point that would have sucked with my class-basedsolution -- allocating a class to use an array as backing.

3. Filter that has access to transform the buffer data/copy it.
4. Ranges that use the buffer/filter to process/present the data.
Yes, yes and yes. I find it surprisingly good to see our vision seems tomatch. I was half-expecting you'd come along and destroy it all ;)

:) I've been preaching for a while that ranges don't make good streams,and that streams should be classes, but I hadn't considered splitting outthe buffer. I think it's the right balance.

The problem I struggled with is the presentation of UTF data of any
format as char[] wchar[] or dchar[]. 2 things need to happen. First is
that the data needs to be post-processed to perform any necessary byte
swapping. The second is to transcode the data into the correct width.

In this way, you can process UTF data of any type (I even have code to
detect the encoding and automatically process it), and then use it in a
way that makes sense for your code.

My solution was to paste in a "processing" delegate into the class
hierarchy of buffered streams that allowed one read/write access to the
buffer. But it's clunky, and difficult to deal with in a generalized
fashion.

But the idea of using a buffer in between the stream and the range, and
possibly bolting together multiple transformations in a clean way, makes
this problem easy to solve, and I think it is closer to the vision
Andrei/Walter have.

In essence a transcoding filter for UTF-16 would wrap a buffer of ubyteand itself present a buffer interface (but of wchar).

My intended interface allows you to specify the desired type per read.Think of the case of stdin, where the clients will be varied and writtenby many different people, and its interface is decided by Phobos.

But a transcoding buffer may make some optimizations. For instance,reading a UTF32 file as utf-8 can re-use the same buffer, as no code unituses more than 4 code points (did I get that right?).

I am going to study your code some more and see how I can update my code
to use it. I still need to maintain the std.stdio.File interface, and
Walter is insistent that the initial state of stdout/err/in must be
synchronous with C (which kind of sucks, but I have plans on how to make
it not be so bad).

I seriously not seeing how interfacing with C runtime could be fastenough.

It's not. But an important stipulation in order for this to all beaccepted is that it doesn't break existing code that expects things likeprintf and writef to interleave properly.

However, I think we can have an opt-in scheme, and there are certain caseswhere we can proactively switch to a D-buffer scheme. For example, if youget a ByLine range, it expects to exhaust the data from stream, and maynot properly work with C printf.

The idea is that stdio.File can switch at runtime from FILE * to D streamsas needed or directed.

There is still a lot of work left to do, but I think one of the hard
parts is done, namely dealing with UTF transcoding. The remaining sticky
part is dealing with shared. But with structs, this should make things
much easier.


I'm thinking a generic locking wrapper is possible along the lines of:

shared Locked!(GenericBuffer!char) stdin; //usage

struct Locked(T){
shared:
private:
        T _this;
        Mutex mut;
public:
        //forwarded methods
}

The wrapper will introduce a lock, and implement every method of wrappedstruct roughly like this:

mut.lock();
scope(exit) mut.unlock();
(cast(T*)_this).method(args);

I'm sure it could be pretty automatic.

This would be a key addition for ANY type in order to properly work withshared. BUT, I don't see how it works safely generically because younecessarily have to cast away shared in order to call the methods. Youwould have to limit this to only working on types it was intended for.

I've been expecting to have to do something like this, but not lookingforward to it :(

One question, is there a reason a buffer type has to be a range at all?
I can see where it's easy to make it a range, but I don't see
higher-level code using the range primitives when dealing with chunks of
a stream.
Lexers/parsers enjoy it - i.e. they work pretty much as rangesespecially when skipping spaces and the like. As I said the main reasonwas: if it fits as range why not? After all it makes one-pass processingof data trivial as it rides on top of foreach:
foreach(octect; mybuffer)
{
        if(intersting(octect))
                do_cool_stuff();
}
Things like countUntil make perfect sense when called on buffer (e.g. tofind matching sentinel).

I think I misstated my question. What I am curious about is why a typemust be a forward range to pass isBuffer. Of course, if it makes sense fora buffer type to also be a range, it can certainly implement thatinterface as well. But I don't know that I would need those primitives inall cases. I don't have any specific use case for having a buffer thatdoesn't implement a range interface, but I am hesitant to necessarilycouple the buffer interface to ranges just because we can't think of acounter-case :)


-Steve

Re: [RFC] I/O and Buffer Range

Reply via email to