On Thu, 16 Jan 2014 13:44:08 -0500, Dmitry Olshansky
<[email protected]> wrote:
16-Jan-2014 19:55, Steven Schveighoffer пишет:
On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
<[email protected]> wrote:
Then our goals are aligned. Be sure to take a peek at (if you haven't
already):
https://github.com/schveiguy/phobos/blob/new-io/std/io.d
Yes, I'm gearing up to revisit that after a long D hiatus, and I came
across this thread.
At this point, I really really like the ideas that you have in this. It
solves an issue that I struggled with, and my solution was quite clunky.
I am thinking of this layout for streams/buffers:
1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
I have pretty much written)
2. Buffer like you have, based on a struct, with specific primitives.
It's job is to collect data from the underlying stream, and present it
to consumers as a random-access buffer.
The only interesting thing I'd add here s that some buffer may work
without underlying stream. Best examples are arrays and MM-files.
Yes, but I would stress that for convenience, the buffer should forward
some of the stream primitives (such as seeking) in cases where seeking is
possible, at least in the case of a buffer that wraps a stream.
That actually is another point that would have sucked with my class-based
solution -- allocating a class to use an array as backing.
3. Filter that has access to transform the buffer data/copy it.
4. Ranges that use the buffer/filter to process/present the data.
Yes, yes and yes. I find it surprisingly good to see our vision seems to
match. I was half-expecting you'd come along and destroy it all ;)
:) I've been preaching for a while that ranges don't make good streams,
and that streams should be classes, but I hadn't considered splitting out
the buffer. I think it's the right balance.
The problem I struggled with is the presentation of UTF data of any
format as char[] wchar[] or dchar[]. 2 things need to happen. First is
that the data needs to be post-processed to perform any necessary byte
swapping. The second is to transcode the data into the correct width.
In this way, you can process UTF data of any type (I even have code to
detect the encoding and automatically process it), and then use it in a
way that makes sense for your code.
My solution was to paste in a "processing" delegate into the class
hierarchy of buffered streams that allowed one read/write access to the
buffer. But it's clunky, and difficult to deal with in a generalized
fashion.
But the idea of using a buffer in between the stream and the range, and
possibly bolting together multiple transformations in a clean way, makes
this problem easy to solve, and I think it is closer to the vision
Andrei/Walter have.
In essence a transcoding filter for UTF-16 would wrap a buffer of ubyte
and itself present a buffer interface (but of wchar).
My intended interface allows you to specify the desired type per read.
Think of the case of stdin, where the clients will be varied and written
by many different people, and its interface is decided by Phobos.
But a transcoding buffer may make some optimizations. For instance,
reading a UTF32 file as utf-8 can re-use the same buffer, as no code unit
uses more than 4 code points (did I get that right?).
I am going to study your code some more and see how I can update my code
to use it. I still need to maintain the std.stdio.File interface, and
Walter is insistent that the initial state of stdout/err/in must be
synchronous with C (which kind of sucks, but I have plans on how to make
it not be so bad).
I seriously not seeing how interfacing with C runtime could be fast
enough.
It's not. But an important stipulation in order for this to all be
accepted is that it doesn't break existing code that expects things like
printf and writef to interleave properly.
However, I think we can have an opt-in scheme, and there are certain cases
where we can proactively switch to a D-buffer scheme. For example, if you
get a ByLine range, it expects to exhaust the data from stream, and may
not properly work with C printf.
The idea is that stdio.File can switch at runtime from FILE * to D streams
as needed or directed.
There is still a lot of work left to do, but I think one of the hard
parts is done, namely dealing with UTF transcoding. The remaining sticky
part is dealing with shared. But with structs, this should make things
much easier.
I'm thinking a generic locking wrapper is possible along the lines of:
shared Locked!(GenericBuffer!char) stdin; //usage
struct Locked(T){
shared:
private:
T _this;
Mutex mut;
public:
//forwarded methods
}
The wrapper will introduce a lock, and implement every method of wrapped
struct roughly like this:
mut.lock();
scope(exit) mut.unlock();
(cast(T*)_this).method(args);
I'm sure it could be pretty automatic.
This would be a key addition for ANY type in order to properly work with
shared. BUT, I don't see how it works safely generically because you
necessarily have to cast away shared in order to call the methods. You
would have to limit this to only working on types it was intended for.
I've been expecting to have to do something like this, but not looking
forward to it :(
One question, is there a reason a buffer type has to be a range at all?
I can see where it's easy to make it a range, but I don't see
higher-level code using the range primitives when dealing with chunks of
a stream.
Lexers/parsers enjoy it - i.e. they work pretty much as ranges
especially when skipping spaces and the like. As I said the main reason
was: if it fits as range why not? After all it makes one-pass processing
of data trivial as it rides on top of foreach:
foreach(octect; mybuffer)
{
if(intersting(octect))
do_cool_stuff();
}
Things like countUntil make perfect sense when called on buffer (e.g. to
find matching sentinel).
I think I misstated my question. What I am curious about is why a type
must be a forward range to pass isBuffer. Of course, if it makes sense for
a buffer type to also be a range, it can certainly implement that
interface as well. But I don't know that I would need those primitives in
all cases. I don't have any specific use case for having a buffer that
doesn't implement a range interface, but I am hesitant to necessarily
couple the buffer interface to ranges just because we can't think of a
counter-case :)
-Steve