[
https://issues.apache.org/jira/browse/LUCENE-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Barry updated LUCENE-4559:
-------------------------------
Summary: PerFieldSimilarityWrapper issue with queryNorm() and coord()
(was: PerFieldSimilarityWrapper)
> PerFieldSimilarityWrapper issue with queryNorm() and coord()
> ------------------------------------------------------------
>
> Key: LUCENE-4559
> URL: https://issues.apache.org/jira/browse/LUCENE-4559
> Project: Lucene - Core
> Issue Type: Improvement
> Affects Versions: 4.0
> Reporter: Joel Barry
> Priority: Minor
>
> This issue requests that documentation be clarified for the current
> behavior of queryNorm() and coord() on PerFieldAnalyzerWrapper and
> that support is added for the use case described below.
> The documentation for PerFieldAnalyzerWrapper (lucene 4.0) says:
> {noformat}
> Subclasses should implement get(String) to return an appropriate
> Similarity (for example, using field-specific parameter values) for
> the field.
> {noformat}
> This is misleading because of the behavior for queryNorm() and
> coord(). The Similarity returned from get() is not accessed for these
> methods. Instead, the PerFieldAnalyzerWrapper subclass methods are
> called. I understand that this is because these methods apply to the
> query as a whole rather than per field. However, consider the
> following. A PerFieldAnalyzerWrapper with no per-field behavior (just
> returns DefaultSimilarity in get()) behaves differently than
> DefaultSimilarity itself:
> {noformat}
> class MyPerFieldSimilarity1 extends PerFieldSimilarityWrapper {
> @Override
> public Similarity get(String name) {
> return new DefaultSimilarity();
> }
> }
> public class PerFieldSimilarityWrapperTest {
> private float runQuery(Similarity similarity) throws IOException {
> IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40,
> new WhitespaceAnalyzer(Version.LUCENE_40));
> config.setSimilarity(similarity);
> Directory dir = new RAMDirectory();
> IndexWriter writer = new IndexWriter(dir, config);
> Document doc = new Document();
> doc.add(new TextField("A-field", "first", Store.YES));
> writer.addDocument(doc);
> writer.commit();
>
> IndexReader reader = DirectoryReader.open(dir);
> IndexSearcher searcher = new IndexSearcher(reader);
> searcher.setSimilarity(similarity);
> TermQuery query = new TermQuery(new Term("A-field", "first"));
> TopDocs topDocs = searcher.search(query, 1);
> return topDocs.scoreDocs[0].score;
> }
>
> @Test
> public void testSimple() throws Exception {
> float score1 = runQuery(new DefaultSimilarity());
> float score2 = runQuery(new MyPerFieldSimilarity1());
> assertEquals(score1, score2, 0.0001);
> // java.lang.AssertionError:
> // expected:<0.3068528175354004> but was:<0.09415864944458008>
> }
> {noformat}
> One solution is to override and forward, e.g.
> {noformat}
> class MyPerFieldSimilarity1 extends PerFieldSimilarityWrapper {
> @Override
> public Similarity get(String name) {
> return new DefaultSimilarity();
> }
> @Override
> public float coord(int overlap, int maxOverlap) {
> return get("dummy").coord(overlap, maxOverlap);
> }
> @Override
> public float queryNorm(float valueForNormalization) {
> return get("dummy").queryNorm(valueForNormalization);
> }
> }
> {noformat}
> However, these methods don't have access to query field data, thus the
> "dummy" argument.
> Suppose an application arranges documents so that there are two
> distinct field groupings:
> {noformat}
> Document:
> A-field1
> A-field2
> A-field3
> B-field1
> B-field2
> B-field3
> {noformat}
> The application creates queries that use the A fields, or the B
> fields, but never both A and B in the same query. Then it seems
> reasonable that PerFieldAnalyzerWrapper should provide a way for
> queryNorm() and coord() to operate on these sets of fields. This
> cannot be done with the current implementation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]