More on "Component Programming"

bearophile Mon, 27 May 2013 14:40:23 -0700

This simple task on Rosettacode site is useful to show some usesof Phobos and the "component programming" recently discussed byWalter (other languages use a different name to denote the sameidea).

Given a dictionary file of different words, it asks to find anyof the longest anagram pairs, that also share no equal chars inthe same position (so they are named deranged anagrams):


http://rosettacode.org/wiki/Anagrams/Deranged_anagrams#D

There are many ways to do this in D+Phobos. The followingsolution is long, but it's quite fast (the "warmed up" run-timeis only about 0.03 seconds with a dictionary of about 200 KB, onan old CPU core), I have chosen it over simple solutions becauseit gives me a chance to discuss certain things:




import std.stdio, std.file, std.algorithm, std.string,
       std.typecons, std.range, std.functional;

auto findDeranged(in string[] words) pure /*nothrow*/ {

//return words.pairwise.filter!(ww=> ww[].zip.all!q{a[0] !=a[1]});

    Tuple!(string, string)[] result;
    foreach (immutable i, const w1; words)
        foreach (const w2; words[i + 1 .. $])
            if (zip(w1, w2).all!q{ a[0] != a[1] })
                result ~= tuple(w1, w2);
    return result;
}

void main() {
    Appender!(string[])[30] wClasses;

foreach (word;std.algorithm.splitter("unixdict.txt".readText))

        wClasses[$ - word.length] ~= word;

    "Longest deranged anagrams:".writeln;

foreach (words; wClasses[].map!q{ a.data}.filter!(not!empty)) {

        string[][const ubyte[]] anags; // Assume ASCII input.
        foreach (w; words)
            anags[w.dup.representation.sort().release.idup] ~= w;
        auto pairs = anags.byValue.map!findDeranged.join;
        if (!pairs.empty)
            return writefln("  %s, %s", pairs.front[]);
    }
}


- - - - - - - - - - - -

That program contains five foreach loops. Foreach loops are notevil and I like them, but for a certain kind of programming(discussed recently by Walter, and also common in F# and otherlanguages) every time you use a for/foreach it's one small"failure" for the standard library :-)

The following weird (untested and maybe buggy) program replacesall the foreach loops with higher order functions and otherlibrary functions. It can't be compiled because it uses somethings not yet present in Phobos (on the Rosettacode page thereis also a slower and simpler D solution of this problem that usesonly one foreach):



void main() {
    import std.stdio, std.file, std.algorithm, std.string,
           std.typecons, std.range, std.functional;

    "unixdict.txt"
    .readText
    .splitter
    .classify!q{ a.length }
    .map!q{ a.values } // .byValue is almost OK.
    .array
    .schwartzSort!q{ -a[0].length }
    .release
    .map!(words => words
                   .classify!q{ a
                                .dup
                                .representation
                                .sort()
                                .release
                                .idup }
                   .byValue
                   .map!(words => words
                                  .pairwise
                                  .filter!(ww => ww[]
                                                 .zip

.all!q{ a[0] !=a[1] }))

                   .join)
    .filter(not!empty)
    .front[]
    .binaryReverseArgs!writefln("  %s, %s");
}

A copy of the same code if the newsgroup has messed up theformatting and indents, turning that code into a soup:

http://codepad.org/L4TyDkcQ

I am not suggesting you to write whole D script-like programs inthis strange style. But I think Phobos should offer all the toolsto write a program like this, because even if you don't want towrite a whole little program in this style, you sometimes want touse some parts of it or some other parts of it, so I think allthe most common and distinct micro-patterns should be containedin Phobos.


- - - - - - - - - - - -

"binaryReverseArgs" is in the std.functional module. Here itallows the use of writefln in UFCS style, inverting theformatting string position. I think I'd like a shorter and morehandy name for it. In Haskell it's named "flip", and its usage isnot uncommon.


- - - - - - - - - - - -

"classify" is a simple function, that given a forward range of Tand an optional function T->K, returns an associative arrayT[][K]. (Arrays are used by default as values. But maybe you canoptionally specify a different type of values, like Appenders,Arrays, sets, etc). (Currently in Phobos the only function tobuild an associative array is std.array.assocArray, but here weneed something different).(http://d.puremagic.com/issues/show_bug.cgi?id=5502 ).


[1, 7, 6, 3, 2].classify!(x => x % 2 ? "odd": "even").writeln;

==>
["odd": [1, 7, 3], "even": [6, 2]]

- - - - - - - - - - - -

"pairwise" is a very useful lazy range similar tocartesianProduct, but it yields only the ordered pairs, so theycover only about half (a triangle) of the square matrix of thepossibilities.(http://d.puremagic.com/issues/show_bug.cgi?id=6788 ).



This simple example shows the difference:

import std.stdio, std.algorithm;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (xy; cartesianProduct(data, data))
        writeln(xy);
}


Generates the tuples:
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)


While:

import std.stdio, std.range;
void main() {
    auto data = [1, 2, 3, 4];
    foreach (tup; pairwise(data))
        writeln(tup);
}


Should generate:
(1, 2)
(1, 3)
(1, 4)
(2, 3)
(2, 4)
(3, 4)

In the Python standard library there is a lazy generator that'smore general than pairwise:

from itertools import combinations
list(combinations([1, 2, 3, 4], 2))

[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]


So if you prefer that more general solution the D code becomes:

...
                   .map!(words => words
                                  .combinations(2)
                                  .filter!(ww => ww[]
...

Bye,
bearophile

More on "Component Programming"

Reply via email to