On 17.07.2011 0:41, Willy Martinez wrote:
== Quote from Dmitry Olshansky (dmitry.o...@gmail.com)'s article
If you wish to avoid storing all of this in an array by using e.g.
filter _and_ use Boyer-Moore search on it then: No, you can't do that.
The reason is that filter is ForwardRange with an important consequence
that you can't look at arbitrary Nth element in O(1). And Boyer-Moore
requires such and access to be anywhere efficient.
Why doesn't filter not provide O(1) random access ? Because to get Nth
element you'd need to check at least N (and potentially unlimited)
number of elements before in case they get filtered out.
Any help?
If I'd had this sort of problem I'd use something along the lines:
auto file = File("yourfile");
foreach( line; file.ByLine)
{
auto onlyDigitis = array(filter!((x){ return !isWhite(x);
})(line)); // this copies all digits to a new array
auto result = find(onlyDigits, ... ); //your query here
///....
}
Thanks
I don't mind storing it in memory. Each .txt file is around 20MB so the filtered
string should be even smaller.
Still, calling array gives this error:
Not exactly calling array but I perfectly understand why you have
confused it.
..\..\src\phobos\std\algorithm.d(3252): Error: function
std.algorithm.BoyerMooreFinder!(result,string).BoyerMooreFinder.beFound (string
haystack) is not callable using argument types (dchar[])
..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
expression (haystack) of type dchar[] to string
..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
expression (needle.beFound((__error))) of type string to dchar[]
search_seq.d(13):
Error: template instance std.algorithm.find!(dchar[],result,string) error
instantiating
Let's drill down to the problem through this barrage of crap:
the problem statement is
Error: cannot implicitly convert expression (haystack) of type dchar[] to
string
So (apparently) the problem is that after array(filter!(... you get array of
dchars (unicode codepoints)as a result of filtering string (which is UTF-8
under the hood btw) while you are going to search an UTF-8 string.
And UTF-8 string is (once again) is not random accessible in sense of
codepoints (it's needs an UTF decode though it's clearly not needed in your
case).
The simplest workaround I can think of is convert needle to dstring:
auto needle = boyerMooreFinder(to!dstring(args[1])); //found in std.conv
From this code:
import std.algorithm;
import std.array;
import std.file;
import std.stdio;
void main(string[] args) {
auto needle = boyerMooreFinder(args[1]);
foreach (string name; dirEntries(".", SpanMode.shallow)) {
if (name[$-3 .. $] == "txt") {
writeln(name);
string text = readText(name);
auto haystack = array(filter!("a>= '0'&& a<=
'9'")(text));
auto result = find(haystack, needle);
writeln(result);
}
}
}
I'm using DMD 2.054 on Windows if that helps
--
Dmitry Olshansky