Re: stream interfaces - with ranges

Steven Schveighoffer Fri, 18 May 2012 06:48:47 -0700

On Fri, 18 May 2012 03:52:51 -0400, Mehrdad <wfunct...@hotmail.com> wrote:

On Thursday, 17 May 2012 at 14:02:09 UTC, Steven Schveighoffer wrote:
2. I realized, buffering input stream of type T is actually an inputrange of type T[].
The trouble is, why a slice? Why not an std.array.Array? Why not someother data source?
(Check/egg problem....)

Well, because that's what i/o buffers are :) There isn't an OS primitivethat reads a file descriptor into an e.g. linked list. Anything otherthan a slice would go through a translation.


I don't know what std.array.Array is.

Another problem I've noticed is the following:
Say you're tokenizing some input range, and it happens to just be ahuge, gigantic string.
It *should* be possible to turn it into tokens with slices referring tothe ORIGINAL string, which is VERY efficient because it doesn't require*any* heap allocations whatsoever. (You just tokenize with opApply() asyou go, without every requiring a heap allocation...)
However, this is *only* possible if you don't use the concept of aninput range!


How so?  A slice is an input range, and so is a string.

Since you can't slice an input range, you'd be forced to use the front()and popFront() properties. But, as soon as you do that, you're gonnahave to store the data somewhere... so your next-best option is toappend it to some new gigantic array (instead of a bunch of smallarrays, which require a lot of heap allocations), but even then, it'snot as efficient as possible, because there's O(n) extra memory involved-- which defeats the whole purpose of working on small chunks at a timewith no heap allocations.(If you're going to do that, after all, you might as well read theentire thing into a giant string at the beginning, and work with anarray anyway, discarding the whole idea of a range while doing yourtokenization.)
Any ideas on how to solve this problem?

I think I get what you are saying here -- if you are processing, say, anXML file, and you want to split that into tokens, you have to dup eachtoken from the stream, because the buffer may be reused.


But doing the same thing for a string would be wasteful.

I think in these cases, we need two types of parsing. One is process thestream as it's read into a temporary buffer. If you need data from thetemporary buffer beyond the scope of the processing loop, you need to dupit.

Other way is read the entire file/stream into a buffer, then process thatbuffer with the knowledge that it's never going to change.

We probably can have buffer identify which situation it's in, so the codecan make a runtime decision on whether to dup or not.


-Steve

Re: stream interfaces - with ranges

Reply via email to