Hi all,
My Lucene IndexSearcher returns too few hits when I use some extended
query syntaxt. I'll give examples of my query/hits pairs at the bottom.
I'm indexing a database table:
<CODE>
//creating a full index
DBConnection conn = null;
try {
Analyzer analyzer = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(indexFile, analyzer, true);
conn =
DBConnectionFactory.getDBConnetion(DBConnectionFactory.ORACLE);
String query = "select id, category, problem, solution from
kedb.solution";
conn.open(cred);
ResultSet rs = conn.execute(query);
while (rs.next()) {
Document doc = new Document();
doc.add(Field.Text("id", rs.getString(1)));
doc.add(Field.Text("category", rs.getString(2)));
doc.add(Field.Text("problem", rs.getString(3)));
doc.add(Field.UnStored("solution", rs.getString(4)));
writer.addDocument(doc);
}
writer.close();
conn.close();
} catch (Exception ex) {
ex.printStackTrace();
} finally {
conn.close();
}
//searching the index
Analyzer analyzer = new StandardAnalyzer();
Searcher searcher = new IndexSearcher(IndexReader.open(indexFile));
Query q = QueryParser.parse(query, "problem", analyzer);
Hits hits = searcher.search(q);
for (int i = 0; i < hits.length(); i++) {
if (0 == cat.compareTo("") || 0 ==
hits.doc(i).get("category").compareTo(cat)) {
ids.add(hits.doc(i).get("id"));
cats.add(hits.doc(i).get("category"));
probs.add(hits.doc(i).get("problem"));
}
}
</CODE>
This is sample data from the table. Right, it's German and not English!
I tried GermanAnalyzer instead of StandardAnalyzer but that made the
search results even worse.
ID;CATEGORY;PROBLEM;SOLUTION;
1;2;Irgendetwas mit dem Novell Server stimmt nicht;Server neu booten;
2;15;User kann sich nicht mehr am Novell anmelden. Kabel ist
eingesteckt. Neubooten n�tzt nichts. Baum wird nicht gefunden.;Baum
manuell suchen �ber Button auf Login-Maske.;
3;9;Makros in Word funktionieren nicht;Optionen ge�ndert (Medium Level).
Extras - Optionen - Makros;
4;5;Scanner funktioniert nicht. Ger�t erscheint nicht in der Liste der
USB Ger�te.;Neusten Treiber installieren.
VORSICHT: NT 4.0 unterst�tzt kein USB!!!;
5;16;Maus spinnt;Reinigen/Auswechseln;
6;16;Eheprobleme;Adresse einer Beratungsstelle vermitteln;
7;11;Browser bringt imme eine Sexseite als Startseite;Extras - Optionen
- Startseite festlegen;
Query /hits /expected
1 Novell /2 /2
2 Nove* /0 /2
3 Novel~ /0 /2
4 N?vell /0 /2
5 +Scanner +Baum /0 /0
6 Scanner -USB /0 /0
7 usb AND liste /1 /1
8 Mikros~ /1 /1
9 Ehe* /1 /0
10 "Word Optionen"~10/1 /0
11 "Browser Sexseite"~10/1 /1
Boolean operators never seem to be a problem (ok, it's the easiest to
implement ;-)). Fuzzy searches, wildcard searches, plus proximity
searches just don't return enough hits. Very surprising are number 10 &
11. At 10 the proximity fails but at 11 everything is fine!
If you have any tips as for how to improve the search process please let
me know.
Regards,
Marcel
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]