On Nov 16, 2012, at 7:49 AM, bioinfornatics <[email protected]> 
wrote:

> hi,
> 
> I would like to count number of one ore more letter into a string or list of 
> string (string[]) without use a for loop but instead using std.algorithm to 
> compute efficiently.
> 
> if you have:
> string   seq1 = "ACGATCGATCGATCGCGCTAGCTAGCTAG";
> string[] seq2 = ["ACGATCGATCGATCGCGCTAGCTAGCTAG", "ACGATGACGATCGATGCTAGCTAG"];
> 
> i try :
> 
> reduce!( (seq) => seq.count("G"), seq.count("C"))(tuple(0LU,0LU),seq1)

D has map and reduce but not MapReduce, so this approach feels a bit unnatural. 
 Assuming ASCII characters and a reasonably sized sequence, here's the simplest 
approach:

        auto seq1 = cast(byte[])("ACGATCGATCGATCGCGCTAGCTAGCTAG".dup);

        foreach(e; group(sort(seq1))) {
                writefln("%s occurs %s times", cast(char) e[0], e[1]);
        }

For real code, the correct approach really depends on the number of discrete 
values, how dense the set of values is, and the total number of elements to 
evaluate.  For English letters the fastest result is likely an int[26].  For a 
more diverse set of input, a hash table.  For a huge input size, something like 
MapReduce is appropriate.

Reply via email to