On 16.07.2011 23:05, Willy Martinez wrote:
Hello. I'm new to D but bear with me, please.
I have several files that look like this:
71104 08924 72394 13995 49707 98696
48245 08311 44066 67172 56025 07952
00384 37808 90166 13871 94258 37216
I'm trying to read those files and search for sequences of digits inside,
hopefully with the Boyer-Moore implementation in std.algorithm.
Right now I have made a small script that iterates over the .txt files in the
current directory and reads line by line and uses find on it.
But I haven't been able to write a range that removes the whitespace and can
be used with find. It should generate one long stream of digits like:
711040892472394139954970798696482450831144066671725602507952003843780890166138719425837216
If you wish to avoid storing all of this in an array by using e.g.
filter _and_ use Boyer-Moore search on it then: No, you can't do that.
The reason is that filter is ForwardRange with an important consequence
that you can't look at arbitrary Nth element in O(1). And Boyer-Moore
requires such and access to be anywhere efficient.
Why doesn't filter not provide O(1) random access ? Because to get Nth
element you'd need to check at least N (and potentially unlimited)
number of elements before in case they get filtered out.
Any help?
If I'd had this sort of problem I'd use something along the lines:
auto file = File("yourfile");
foreach( line; file.ByLine)
{
auto onlyDigitis = array(filter!((x){ return !isWhite(x);
})(line)); // this copies all digits to a new array
auto result = find(onlyDigits, ... ); //your query here
///....
}
Thanks
--
Dmitry Olshansky