Brian,

  It looks like "variable" is variable; and you'll probably want to use some 
combination of PhraseQuery, FuzzyQuery and maybe BooleanQuery.  I've made my 
best guess at what the underlying types of Queries would be that would meet 
your use cases below.

"free text" : Doc1, Doc2  :: PhraseQuery
"version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending 
on what exactly you mean
"text version" : Doc2, Doc3 :: PhraseQuery with slop 
"some version text" : Doc2, Doc3  :: BooleanQuery (I don't see some in your 
documents)??
"long" : Doc2 :: You'll need to use a stemming  analyzer to match this or use 
FuzzyQuery with maxEdits = 2 (long~2)
"anothr" : Doc3 :: FuzzyQuery with maxEdits = 1 (anothr~1)

And maybe even:

"another longer free text" : Doc1, Doc2, Doc3  :: BooleanQuery

FuzzyQuery captures variation within a token (Levenshtein edit distance, er, 
Optimal String Alignment...you can get from another to anothr with only one 
keystroke difference); PhraseQuery allows for flexibility for combinations of 
tokens.

  Do you need to generate your queries by hand in code or would a query parser 
help out (see this for the classic parser's syntax: 
http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html).

  Best,

            Tim

-----Original Message-----
From: Wasikowski, Brian [ JRDUS] [mailto:bwasi...@its.jnj.com] 
Sent: Friday, September 13, 2013 1:03 PM
To: java-user@lucene.apache.org
Subject: variable string search

First let me start by saying:  I'm sorry!

I know this question has probably been asked and answered already, but I am new 
to this project and just trying to get up to speed.  I do have a very simple 
example working, but not quite how I'd like.   So let me explain what I'd like 
to do and see if the community can suggest the proper analyzer and query.

Consider the following data to be indexed:

Doc1: This is free text
Doc2: This is a longer version of free text
Doc3: Yet another version of text

Right now, the working example I have will match a query for "text" with all 3 
documents.
However, any combination of more words or partials do not work.

Uses cases:

"free text" : Doc1, Doc2  :: PhraseQuery
"version text" : Doc2, Doc 3 :: PhraseQuery with slop or BooleanQuery depending 
on what exactly you mean
"text version" : Doc2, Doc3 :: PhraseQuery with slop and no directionality
"some version text" : Doc2, Doc3  :: BooleanQuery ??
"long" : Doc2 :: You'll need to use a stemming  analyzer to match this or use 
FuzzyQuery long~2
"anothr" : Doc3 :: FuzzyQuery another~1

And maybe even:

"another longer free text" : Doc1, Doc2, Doc3  :: BooleanQuery

Any help is appreciated.  Here are the components I am currently using:

Lucene.Net.Analysis.Standard.StandardAnalyzer
Lucene.Net.Search.Query query = new Lucene.Net.Search.FuzzyQuery
Lucene.Net.Search.TopDocs hits = searcher.Search

________________________________________
Brian Wasikowski
Director, HIT Alliances and Support
Janssen Diagnostics, Inc.
Tel: +1 919 786 9153
Fax: +1 919 882 0913
Email: bwasi...@its.jnj.com<mailto:bwasi...@its.jnj.com>
Web: www.janssendiagnostics.com

IntraLinks Courier 
Dropbox<https://services.intralinks.com/ILClient/courier/lockbox.html?p1=331268255215951899&p2=QnJpYW4gV2FzaWtvd3NraQ%3D%3D>

Confidentiality Notice: This e-mail transmission may contain confidential or 
legally privileged information that is intended only for the individual or 
entity named in the e-mail address. If you are not the intended recipient, you 
are hereby notified that any disclosure, copying, distribution, or reliance 
upon the contents of this e-mail is strictly prohibited. If you have received 
this e-mail transmission in error, please reply to the sender, so that Johnson 
& Johnson can arrange for proper delivery, and then please delete the message 
from your inbox. Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to