Hi Eric, all,
with a 'k' :))
Several of my terms are in fact keyphrases with 2 or more words separated by whitespaces, e.g. 'host defense'.
You've not told us how you are indexing. What field type are you using? From your description it seems you want to analyze text as it may have special characters.
These are the types of decisions that really matter when using Lucene. My first hunch is that you need a domain-aware analyzer that knows when it sees "host defense", "Host-Defense", "Host_DEFENSE" that it tokenizes it as "host defense".
Or perhaps you need an analyzer that does a floating window of two words and bi-grams them into single tokens?
I don't really have any quick and easy answers for you - you're asking for domain specific common sense in the analysis process from what I am gathering, and Lucene itself makes this possible but does not give it to you for free.
You could, perhaps, take an easier way out and run text through an Analyzer as you build up your query, without using QueryParser. Look, again, at my AnalysisDemo code in the java.net article.... just pull what you need from there to process a TokenStream out of an Analyzer.
Erik
They are obviously not handled properly during the construction of the boolean query because 'host defense' is not found though it is in the field. Replacing the whitespace inbetween the words by an underscore ('host_defense' which is recognised by query parser and yields similar results to double
quoting, e.g. "host defense") did not retrieve either ...
I had to convert to lowercase before sending to his function because - unlike in the QueryParser call - no analyzer is used at the moment. Indexing was done with StandardAnalyzer so I would prefer using an analyser at search as well. The terms are well formed because they are taken from a domain ontology but there could be inconsistencies in spelling between what is in the ontology and
what is in the field, e.g. as 'host-defense' which would need equivalent handling to 'host defense'. Guess this will be dealt with by the analyser - but where do I
put it within the current code (see below) with boolean query generation ?
Any hints ? Anyway - thanks a lot so far !
Holger
Code follows:
public String[] doSearchBQ(String index_path, String[] myquery){ // does query processing without QueryParser but by contructing a boolean query try { Searcher searcher = new IndexSearcher(index_path); Analyzer analyzer = new StandardAnalyzer(); BooleanQuery query = new BooleanQuery(); //for each term to add: for (int j=0; j<myquery.length; j++){ query.add(new TermQuery(new Term("subject", myquery[j])), false, false); } Hits hits = searcher.search(query); lucene_out = new String[hits.length()]; for (int i = 0; i < hits.length(); i ++) { Document doc = hits.doc(i); String name = doc.get("filename"); lucene_out[i] = name + "|" + doc.get("subject") + "|" + doc.get("message"); } searcher.close();
} catch (Exception e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } return lucene_out; }
___________________________________________________ The ALL NEW CS2000 from CompuServe Better! Faster! More Powerful! 250 FREE hours! Sign-on Now! http://www.compuserve.com/trycsrv/cs2000/webmail/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]