On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
I need to parse an ascii with multiple tokens. The tokens can be seen as keys. After every token there is a bunch of lines belonging to that token, the values.
The order of tokens is unknown.

I would like to read the file in as a whole string, and split the string with:
splitter(fileString, [token1, token2, ... tokenN]);

And would like to get a range of strings each starting with tokenX and ending before the next token.

Does something like this exist?

I know how to parse the string line by line and create new strings and append the appropriate lines, but I don't know how to do this with a lazy result range and new allocations.

Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function instead of a separator:

    import std.algorithm;
    auto a = "a,b;c";
    auto b = a.splitter!(e => e == ';' || e == ',');
    assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it only allows single-element separators. This might be good enough given the information you've divulged, but I'll hazard a guess it isn't.

My next stop is std.algorithm.chunkBy:

    auto a = ["a","b","c", "d", "e"];
    auto b = a.chunkBy!(e => e == "a" || e == "d");
    auto result = [
        tuple(true, ["a"]), tuple(false, ["b", "c"]),
        tuple(true, ["d"]), tuple(false, ["e"])
        ];

No assert here, since the ranges in the tuples are not arrays. My immediate concern is that two consecutive tokens with no intervening values will mess it up. Also, the result looks a bit messy. A little more involved, and according to documentation not guaranteed to work:

bool isToken(string s) {
    return s == "a" || s == "d";
}

bool tokenCounter(string s) {
    static string oldToken;
    static bool counter = true;
    if (s.isToken && s != oldToken) {
        oldToken = s;
        counter = !counter;
    }
    return counter;
}

unittest {
    import std.algorithm;
    import std.stdio;
    import std.typecons;
    import std.array;

    auto a = ["a","b","c", "d", "e", "a", "d"];
    auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
    auto result = [
        ["a", "b", "c"],
        ["d", "e"],
        ["a"],
        ["d"]
        ];
    writeln(b);
    writeln(result);
}

Again no assert, but b and result have basically the same contents. Also handles consecutive tokens neatly (but consecutive identical tokens will be grouped together).

Hope this helps.

--
  Simen

Reply via email to