Thanks for the feedback on the highlighter package.
Here are some responses to the issues raised:
>>what may be the performance implications seeing that
>>the method query.rewrite(reader) seems to be called twice, one for
>>querying, once for highlighting.
One solution is to do this before calling the highlighter:
query=query.rewrite(reader); //turn into a primitive query
Hits hits = searcher.search(query);
QueryHighlightExtractor h =
new QueryHighlightExtractor(reader, query, new StandardAnalyzer(), "<B>", "</B>");
Would you want the highlighter to enforce this optimisation by insisting that
queries passed to it are not multi-term ones that require expansion? That
way we would not need to pass an IndexReader to the Highlighter constructors and
should
redefine them to be capable of throwing a "QueryNotRewrittenException" if we find
un-expanded queries are passed.
It seems a bit heavy-handed to beat people over the head like this for not passing
a pre-optimized query. Maybe the best solution is to remove support for highlighting
multi-term queries entirely from the highlighter - the caller must call rewrite()
BEFORE calling
the highlighter if they expect multi-terms to be highlighted. I think thats my
favoured approach
- thoughts?
>>Is it possible to split the logic (2 classes ?) which :
>>a) handles highlighting
>>b) grabs Query terms (method getTerms and its dependencies)
The TextHighlighter class is already a class that purely handles highlighting
(independent of
query terms).
The getTerms() function is made public in QueryHighlighter as I thought it might be of
use to some people. I guess I could move it into a static function on a utility class
somewhere
but I struggle to think of uses outside of text highlighting? Surely the query classes
offer
better metadata about a query (eg phrases, boosts etc) so does this "Term[]
getTerms(Query)" function
warrant a specialised home anywhere?
>>Does anyone know if this package supports highlighting in MultiSearcher
>>environments?
This works but looks ugly:
//setup index 1
RAMDirectory ramDir1 = new RAMDirectory();
IndexWriter writer1 = new IndexWriter(ramDir1, new StandardAnalyzer(), true);
Document d = new Document();
Field f = new Field(FIELD_NAME, "multiOne", true, true, true);
d.add(f);
writer1.addDocument(d);
writer1.optimize();
writer1.close();
IndexReader reader1 = IndexReader.open(ramDir1);
//setup index 2
RAMDirectory ramDir2 = new RAMDirectory();
IndexWriter writer2 = new IndexWriter(ramDir2, new StandardAnalyzer(), true);
d = new Document();
f = new Field(FIELD_NAME, "multiTwo", true, true, true);
d.add(f);
writer2.addDocument(d);
writer2.optimize();
writer2.close();
IndexReader reader2 = IndexReader.open(ramDir2);
IndexSearcher searchers[]=new IndexSearcher[2];
searchers[0] = new IndexSearcher(ramDir1);
searchers[1] = new IndexSearcher(ramDir2);
MultiSearcher multiSearcher=new MultiSearcher(searchers);
query = QueryParser.parse("multi*", FIELD_NAME, new StandardAnalyzer());
System.out.println("Searching for: " + query.toString(FIELD_NAME));
hits = multiSearcher.search(query);
//Now do some query expansion
Query expandedQueries[]=new Query[2];
expandedQueries[0]=query.rewrite(reader1);
expandedQueries[1]=query.rewrite(reader2);
Query combinedExpandedQuery=query.combine(expandedQueries);
//NB The reader passed here is irrelevant as the query is expanded
QueryHighlightExtractor highlighter = new QueryHighlightExtractor(this, reader2,
combinedExpandedQuery, new StandardAnalyzer());
Thanks again
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]