I’m having an issue searching for an exact phrase with Lucene 4.7. My use case
loaded the Declaration of Independence into
a Lucene search database. I search for “it becomes” and I get two hits; one
for “it, becomes” and another for a line that just has
“becomes” at the end of the line.
Expected:
“When, in the course of human events, it becomes necessary for one people to
dissolve the”
Not Expected:
“powers from the consent of the governed. That whenever any form of government
becomes”
Below is my load code and search code:
Directory idxLinesDir = FSDirectory.open(“test lucene index”);
Analyzer analyzerLines = new StandardAnalyzer(Version.LUCENE_47);
IndexWriterConfig iwcLines = new IndexWriterConfig(Version.LUCENE_47,
analyzerLines);
iwcLines.setOpenMode((idxLinesFile.exists()) ?
IndexWriterConfig.OpenMode.CREATE_OR_APPEND :
IndexWriterConfig.OpenMode.CREATE);
IndexWriter writerLines = new IndexWriter(idxLinesDir, iwcLines);
for (int i = 0; i < arrayListOfLines.size(); i++)
{
Document docLine = new Document();
docLine.add(new StringField("docIndex", String.format("%06d", pageNumber)
+ ":" + String.format("%06d", i), Field.Store.YES));
docLine.add(new TextField(“lineText", arrayListOfLines.get(i),
Field.Store.YES));
writerLines.addDocument(docLines);
}
// Search Code
Directory idxDir = FSDirectory.open(idxFile);
IndexReader reader = DirectoryReader.open(idxDir);
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
QueryParser parser = new QueryParser(Version.LUCENE_47, “lineText”, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
parser.setPhraseSlop(0);
Query query = parser.createPhraseQuery(“lineText”, “it becomes”);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(query, collector);
TopDocs results = searcher.search(query, Math.max(1, collector.getTotalHits()));
ScoreDoc[] hits = results.scoreDocs;