I guess to be move concise I'm looking to get all the terms that were searched for so I can highlight them in the original document. After looking through the highlighter contrib class I figure I had found my solution with query.extractTerms. Works great for searches like:
genera* -> generally, general ac? -> act General Act -> general, act and a bunch of others I've tested.. So it's almost perfect except when searching for a Phrase. If someone searched for "General Act" I wouldn't want General and Act highlighted unless they were right beside each other. Thanks, Spencer On Feb 5, 2008 12:50 PM, Spencer Tickner <[EMAIL PROTECTED]> wrote: > Hi Erick, > > Thanks for your response. I think you're right about the Whitespace > anlayzer. I was actually useing the StandardAnalyzer before and tried > the Whitespace analyzer to see if the StandardAnalyzer was pulling off > the quotes. I guess what I'm trying to mimic is the information found: > > http://lucene.apache.org/java/docs/queryparsersyntax.html > > What analyzer would you suggest when parsing a query like: > > title:"The Right Way" AND text:go > > Or will I have to pull apart a user entered query using regular > expressions, or whatever, and use different Queries (such as the > SpanNearQuery) to get the extracted terms? > > Thanks for any advice. > > Spencer > > > > On Feb 5, 2008 12:19 PM, Erick Erickson <[EMAIL PROTECTED]> wrote: > > I don't think WhitespaceAnalyzer is doing what you think it is. From > > the Javadoc... > > > > public class *WhitespaceTokenizer*extends > > CharTokenizer<file:///C:/lucene-2.1.0/docs/api/org/apache/lucene/analysis/CharTokenizer.html> > > > > A WhitespaceTokenizer is a tokenizer that divides text at > > whitespace. Adjacent sequences of non-Whitespace characters form tokens. > > > > ------------------------------ > > > > CharacterTokenizer > > An abstract base class for simple, character-oriented tokenizers. > > > > So I'm pretty sure that CharacterTokenizer is throwing out all the > > non-character data (i.e. your double quotes), then WhitespaceTokenizer > > is breaking on the space. > > > > What is it that you want to have happen? If you're searching for > > "General" right next to "Act", you can use a SpanNearQuery with > > two SpanTermQuerys and a slop of 0. > > > > The other thing to be aware of with WhitespaceAnalyzer is that > > it doesn't lower case anything, so whether you'll get any hits > > in your index depends upon the analyzers you used to index with > > and whether case matches exactly. > > > > Best > > Erick > > > > > > On Feb 5, 2008 3:03 PM, Spencer Tickner <[EMAIL PROTECTED]> wrote: > > > > > Hi List, > > > > > > Thanks in advance for the help. I'm trying to extract terms from a > > > query. From the reading I've done a phrase such as "General Act" is > > > considered a term. > > > http://lucene.apache.org/java/docs/queryparsersyntax.html#Terms . > > > However when I'm doing testing to get the extractTerms of my query it > > > splits this into General and Act. I'm wondering if I'm missing or not > > > understanding something. > > > > > > My test Java code is: > > > > > > private String FIELD_NAME = "rr_root"; > > > private Query query; > > > private Hits hits = null; > > > > > > public void testSearch() throws Exception > > > { > > > doSearching("\"General Act\""); > > > HashSet terms = new HashSet(); > > > query.extractTerms(terms); > > > int i = 0; > > > for (Iterator iter = terms.iterator(); iter.hasNext();) > > > { > > > i++; > > > Term term = (Term)iter.next(); > > > System.out.println(i + " " + "term-" + term.text() > > > + " field-" + > > > term.field()); > > > } > > > } > > > > > > public void doSearching(String queryString) throws Exception > > > { > > > QueryParser parser=new QueryParser(FIELD_NAME, new > > > WhitespaceAnalyzer()); > > > query = parser.parse(queryString); > > > doSearching(query); > > > } > > > public void doSearching(Query unReWrittenQuery) throws Exception > > > { > > > searcher = aspect.getSearcher(); // searcher comming from a > > > cahed class > > > query=unReWrittenQuery.rewrite(aspect.getReader()); // > > > reader > > > comming from a cached class > > > System.out.println("Searching for: " + query.toString > > > (FIELD_NAME)); > > > hits = searcher.search(query); > > > } > > > > > > The current output is: > > > > > > Searching for: "General Act" > > > 1 term-General field-rr_root > > > 2 term-Act field-rr_root > > > > > > The output I expect is: > > > > > > Searching for: "General Act" > > > 1 term-General Act field-rr_root > > > > > > Thanks for any help. > > > > > > Spencer > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
