[R] Faster text search in document database than with grep?

2015-08-03 Thread Witold E Wolski
I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each. I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search of one 3-15 letter words takes in the order of 1s.

Re: [R] Faster text search in document database than with grep?

2015-08-03 Thread Witold E Wolski
Dear Duncan, This is a model of the data I work with. database - replicate(5, paste(sample(letters,rexp(1,1/500), rep=TRUE), collapse=)) words - replicate(1,paste(sample(letters,rexp(1,1/70), rep=TRUE),

Re: [R] Faster text search in document database than with grep?

2015-08-03 Thread Duncan Murdoch
On 03/08/2015 5:25 AM, Witold E Wolski wrote: I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each. I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search