Does Lucene support this type of structure, or do I need to somehow implement it outside Lucene?
By the way, I need this to run on an Android phone so size of memory might be an issue... Thanks, Ilya Zavorin -----Original Message----- From: Dawid Weiss [mailto:dawid.we...@gmail.com] Sent: Friday, August 24, 2012 4:50 PM To: java-user@lucene.apache.org Subject: Re: Efficient string lookup using Lucene What you need is a suffix tree or a suffix array. Both data structures will allow you to perform constant-time searches for existence/ occurrence of any input pattern. Depending on how much text you have on the input it may either be a simple task -- see here: http://labs.carrotsearch.com/jsuffixarrays.html or a complicated task if your input size is larger (larger than memory). Google search for suffix trees/ suffix arrays though, it's the data structure to use here. Dawid On Fri, Aug 24, 2012 at 9:48 PM, Ilya Zavorin <izavo...@caci.com> wrote: > Hi Everyone, > > I have the following task. I have a set of documents in multiple languages. I > don't know what these languages are. Any given doc may contain text in > several languages mixed up. So to me these are just a bunch of Unicode text > files. > > What I need is to implement an efficient EXACT string lookup. That is, I need > to be able to find ANY Unicode string exactly as it appears. I do not care > about language-specific modifications of the string. That is, if I search for > a string "run", I do not need to find "ran" but I do want to find it in all > of these strings below: > > Fox is running fast > !%#^&$run!$!%@&$# > run,run > > Is there a way of using StandardAnalyzer or any other analyzer and the > corresponding query parser to find these? Again, my queries might be more or > less random Unicode sequences and I need to find all their accurrences in the > text. > > Essentially, what I am trying to do is implement substring matching more > efficiently that using Java's standard substring matching methods. > > Thanks! > > Ilya Zavorin --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org