On 17.07.2011 0:41, Willy Martinez wrote:
== Quote from Dmitry Olshansky (dmitry.o...@gmail.com)'s article
If you wish to avoid storing all of this in an array by using e.g.
filter _and_  use Boyer-Moore search on it then: No, you can't do that.
The reason is that filter is ForwardRange with an important consequence
that you can't look at arbitrary Nth element in O(1). And Boyer-Moore
requires such and access to be anywhere efficient.
Why doesn't filter not provide O(1) random access ? Because to get Nth
element you'd need to check at least N (and potentially unlimited)
number of elements before in case they get filtered out.
Any help?
If I'd had this sort of problem I'd use something along the lines:
auto file = File("yourfile");
foreach( line; file.ByLine)
{
      auto onlyDigitis = array(filter!((x){   return !isWhite(x);
})(line)); // this copies all digits to a new array
      auto result = find(onlyDigits, ... ); //your query here
      ///....
}
Thanks
I don't mind storing it in memory. Each .txt file is around 20MB so the filtered
string should be even smaller.

Still, calling array gives this error:

Not exactly calling array but I perfectly understand why you have confused it.


..\..\src\phobos\std\algorithm.d(3252): Error: function
std.algorithm.BoyerMooreFinder!(result,string).BoyerMooreFinder.beFound (string
haystack) is not callable using argument types (dchar[])
..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
expression (haystack) of type dchar[] to string
..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
expression (needle.beFound((__error))) of type string to dchar[] 
search_seq.d(13):
Error: template instance std.algorithm.find!(dchar[],result,string) error
instantiating

Let's drill down to the problem through this barrage of crap:

the problem statement is

 Error: cannot implicitly convert expression (haystack) of type dchar[] to 
string

So (apparently) the problem is that after array(filter!(... you get array of 
dchars (unicode codepoints)as a result of filtering string (which is UTF-8 
under the hood btw) while you are going to search an UTF-8 string.
And UTF-8 string is (once again) is not random  accessible in sense of 
codepoints (it's needs an UTF decode though it's clearly not needed in your 
case).
The simplest workaround I can think of is convert needle to dstring:
auto needle =  boyerMooreFinder(to!dstring(args[1])); //found in std.conv




 From this code:

import std.algorithm;
import std.array;
import std.file;
import std.stdio;

void main(string[] args) {
        auto needle = boyerMooreFinder(args[1]);
        foreach (string name; dirEntries(".", SpanMode.shallow)) {
                if (name[$-3 .. $] == "txt") {
                        writeln(name);
                        string text = readText(name);
                        auto haystack = array(filter!("a>= '0'&&  a<= 
'9'")(text));
                        auto result = find(haystack, needle);
                        writeln(result);
                }
        }
}


I'm using DMD 2.054 on Windows if that helps


--
Dmitry Olshansky

Reply via email to