I have a database of text documents (letter sequences). Several thousands
of documents with approx. 1000-2000 letters each.
I need to find exact matches of short 3-15 letters sequences in those
documents.
Without any regexp patterns the search of one 3-15 letter words takes in
the order of 1s.
Dear Duncan,
This is a model of the data I work with.
database - replicate(5, paste(sample(letters,rexp(1,1/500), rep=TRUE),
collapse=))
words - replicate(1,paste(sample(letters,rexp(1,1/70), rep=TRUE),
On 03/08/2015 5:25 AM, Witold E Wolski wrote:
I have a database of text documents (letter sequences). Several thousands
of documents with approx. 1000-2000 letters each.
I need to find exact matches of short 3-15 letters sequences in those
documents.
Without any regexp patterns the search
3 matches
Mail list logo