Brian, It looks like "variable" is variable; and you'll probably want to use some combination of PhraseQuery, FuzzyQuery and maybe BooleanQuery. I've made my best guess at what the underlying types of Queries would be that would meet your use cases below.
"free text" : Doc1, Doc2 :: PhraseQuery "version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending on what exactly you mean "text version" : Doc2, Doc3 :: PhraseQuery with slop "some version text" : Doc2, Doc3 :: BooleanQuery (I don't see some in your documents)?? "long" : Doc2 :: You'll need to use a stemming analyzer to match this or use FuzzyQuery with maxEdits = 2 (long~2) "anothr" : Doc3 :: FuzzyQuery with maxEdits = 1 (anothr~1) And maybe even: "another longer free text" : Doc1, Doc2, Doc3 :: BooleanQuery FuzzyQuery captures variation within a token (Levenshtein edit distance, er, Optimal String Alignment...you can get from another to anothr with only one keystroke difference); PhraseQuery allows for flexibility for combinations of tokens. Do you need to generate your queries by hand in code or would a query parser help out (see this for the classic parser's syntax: http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html). Best, Tim -----Original Message----- From: Wasikowski, Brian [ JRDUS] [mailto:bwasi...@its.jnj.com] Sent: Friday, September 13, 2013 1:03 PM To: java-user@lucene.apache.org Subject: variable string search First let me start by saying: I'm sorry! I know this question has probably been asked and answered already, but I am new to this project and just trying to get up to speed. I do have a very simple example working, but not quite how I'd like. So let me explain what I'd like to do and see if the community can suggest the proper analyzer and query. Consider the following data to be indexed: Doc1: This is free text Doc2: This is a longer version of free text Doc3: Yet another version of text Right now, the working example I have will match a query for "text" with all 3 documents. However, any combination of more words or partials do not work. Uses cases: "free text" : Doc1, Doc2 :: PhraseQuery "version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending on what exactly you mean "text version" : Doc2, Doc3 :: PhraseQuery with slop and no directionality "some version text" : Doc2, Doc3 :: BooleanQuery ?? "long" : Doc2 :: You'll need to use a stemming analyzer to match this or use FuzzyQuery long~2 "anothr" : Doc3 :: FuzzyQuery another~1 And maybe even: "another longer free text" : Doc1, Doc2, Doc3 :: BooleanQuery Any help is appreciated. Here are the components I am currently using: Lucene.Net.Analysis.Standard.StandardAnalyzer Lucene.Net.Search.Query query = new Lucene.Net.Search.FuzzyQuery Lucene.Net.Search.TopDocs hits = searcher.Search ________________________________________ Brian Wasikowski Director, HIT Alliances and Support Janssen Diagnostics, Inc. Tel: +1 919 786 9153 Fax: +1 919 882 0913 Email: bwasi...@its.jnj.com<mailto:bwasi...@its.jnj.com> Web: www.janssendiagnostics.com IntraLinks Courier Dropbox<https://services.intralinks.com/ILClient/courier/lockbox.html?p1=331268255215951899&p2=QnJpYW4gV2FzaWtvd3NraQ%3D%3D> Confidentiality Notice: This e-mail transmission may contain confidential or legally privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or reliance upon the contents of this e-mail is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that Johnson & Johnson can arrange for proper delivery, and then please delete the message from your inbox. Thank you. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org