bearophile wrote:
Does someone has some need for Ternary Search Trees into Phobos (for D1. And 
eventually later for D2 too)?
TSTs allow to find keys, key prefixes, or even keys with holes. Keys are arrays 
of T, where T is the template type.
They can be designed to store the keys alone, or as an associative data 
structure.

With some benchmarks I have seen that a simple TST implementation is about as 
fast as the built-in AAs of D (but much slower than Python dicts).

Bye,
bearophile

I implemented a version of the TST you posted using Tango + D1... Here are my results:

Test                Part                Mean      Median    Max
----                ----                ----      ------    ---
TST                 Insertion           0.0428    0.0338    0.0886
TST                 Iteration           0.0022    0.0022    0.0024
TST                 Lookup              0.0225    0.0223    0.0237
HashMap             Insertion           0.0621    0.0421    0.2205
HashMap             Iteration           0.0035    0.0034    0.0036
HashMap             Lookup              0.0169    0.0168    0.0184
TreeMap             Insertion           0.1045    0.1041    0.1058
TreeMap             Iteration           0.0041    0.0041    0.0044
TreeMap             Lookup              0.0895    0.0892    0.0917
AssocArray          Insertion           0.0262    0.0262    0.0268
AssocArray          Iteration           0.0015    0.0015    0.0016
AssocArray          Lookup              0.0130    0.0129    0.0132

(TreeMap and HashMap are in tango,util.container, AssocArray is the built-in D associative array; testing code is attached. Compiled with -O -release -inline using DSSS 0.78 and DMD 1.043 on Windows XP x86 SP3).

Indeed, TSTs seem to beat Red-Black BSTs (the implementation Tango's TreeMap uses) quite handily, and have faster insert times than hash maps, though slower lookup. The big winner here, though, appears to be D's built-in associative arrays. I thought they were supposed to be very slow, but Tango's implementation, at least, looks pretty good (without any re-hashing).

For order-sensitive collections, I'm definitely using TSTs; I'm sold on the concept. Not only are they faster, allowing prefix search could be very useful. However, the times here are very low (the word file is only 1.1MB, and my system is fairly fast, though this is memory-dependent not CPU-dependent)... I'll try with a larger word list & see what results I get.
module candy.misc.dstest;

import tango.io.stream.Lines;
import tango.io.device.File;
import tango.time.StopWatch;
import tango.io.Stdout;
import tango.core.Memory;
import tango.core.Array;

import tango.util.collection.HashMap;
import tango.util.collection.TreeMap;

import candy.util.array;
import candy.util.tst;

public struct TestPart
{
        char[] name;
        void delegate() run;
}

public class Test
{
        public char[] name;
        public TestPart[] parts;
}

public static T median(T)(T[] values)
{
        assert(values.length > 0);
        uint mid = values.length / 2;
        return (values.length % 2 == 0) ? ((values[mid - 1] + values[mid]) / 2) 
: values[mid];
}

public static T mean(T)(T[] values)
{
        T total = 0;
        foreach(value; values)
                total += value;
        return total / values.length;
}

public void doTest(char[] sectionName, uint iterations, Test[] tests)
{
        Stdout.format("Running tests for {0}...", sectionName).newline();
        Stdout("Test                Part                Mean      Median    
Max").newline();
        Stdout("----                ----                ----      ------    
---").newline();
        foreach(test; tests)
        {
                double[][] times = new double[][test.parts.length];
                foreach(j, part; test.parts)
                        times[j] = new double[iterations];
                for(int i = 0; i < iterations; i++)
                {
                        foreach(j, part; test.parts)
                        {
                                sw.start();
                                part.run();
                                times[j][i] = sw.stop();
                        }
                        GC.collect();
                }
                foreach(j, part; test.parts)
                {
                        times[j].sort();
                        
Stdout.format("{0,-20}{1,-20}{2,-10:f4}{3,-10:f4}{4,-10:f4}", test.name, 
part.name, mean(times[j]), median(times[j]), times[j][$ - 1]).newline();
                }
        }
        
}

private StopWatch sw;
public const uint SIZE = 1024 * 1000;
public const uint ITERS = 9;
public char[][] words, words2;

public int main()
{
        Stdout.flush(true);
        
        generateDictionary();
        GC.collect();
        Test[] dictTests;
        dictTests ~= new DictTest!("TST", TernaryTree!(char, bool), "new 
TernaryTree!(char, bool)");
        dictTests ~= new DictTest!("HashMap", HashMap!(char[], bool), "new 
HashMap!(char[], bool)");
        dictTests ~= new DictTest!("TreeMap", TreeMap!(char[], bool), "new 
TreeMap!(char[], bool)");
        dictTests ~= new DictTest!("AssocArray", bool[char[]], "", "dict[word] 
= true;", "result = dict[word];");
        doTest("dictionaries", ITERS, dictTests);
        
        return 0;
}

public void generateDictionary()
{
        Array!(char[]) temp;
        foreach(line; new Lines!(char)(new File("wordsEn.txt")))
                if(line.length)
                        temp ~= line;
        words = temp.toArray();
        words2 = temp.toArray().dup;
        words.shuffle();
        words2.shuffle();
}

public class DictTest(char[] testName, T, char[] create, 
        char[] insertElem = "dict.add(word, true);",
        char[] getElem = "dict.get(word, result);") : Test
{
        private T dict;
        
        public this()
        {
                dict = null;
                name = testName;
                parts = [TestPart("Insertion", &insert),
                         TestPart("Iteration", &iterate),
                         TestPart("Lookup", &lookup)];
        }
        
        public void insert()
        {
                static if(create.length)
                        dict = mixin(create);
                foreach(word; words)
                        mixin(insertElem);
        }
        
        public void iterate()
        {
                foreach(word, check; dict)
                        if(!check)
                                throw new Exception("");
        }
        
        public void lookup()
        {
                foreach(word; words2)
                {
                        bool result;
                        mixin(getElem);
                        if(!result)
                                throw new Exception("");
                }
                dict = null;
        }
}

Reply via email to