[ 
https://issues.apache.org/jira/browse/LUCY-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marvin Humphrey updated LUCY-183:
---------------------------------

    Attachment: normalization.patch

The supplied "normalize.patch" file adds a boolean "subordinate" parameter to
Query_Make_Compiler() which defaults to false and indicates whether a Query is
a child of another.  If "subordinate" is indeed false (because the Query is a
top-most node), then Query_Make_Compiler() implementations are now supposed to
invoke Compiler_Normalize() on the newly created Compiler object.

Giving Make_Compiler() an extra parameter and responsibility for invoking
Normalize() is technically an API change, but it should have little impact on
code in the wild.  The only query types with ranking affected by this bug are
those with child nodes, such as ANDQuery, ORQuery and RequiredOptionalQuery,
but I'm unaware of anyone who has subclassed those.  Single-node Query
subclasses (e.g. LucyX::Search::WildCardQuery) will need to have their
normalize() calls moved from their Compiler constructors to make_compiler(),
but IDF for a WildCardQuery is an imprecise notion to begin with.

This patch is also safe for any existing Searcher subclasses or other classes
which currently invoke Make_Compiler() -- the default value of "false" for
"subordinate" is correct for such situations.

The patch applies cleanly against both trunk and the 0.2 branch; I plan to
commit it against both in a day or so.
                
> Eliminate spurious "extra" query normalization
> ----------------------------------------------
>
>                 Key: LUCY-183
>                 URL: https://issues.apache.org/jira/browse/LUCY-183
>             Project: Lucy
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 0.1.0 (incubating), 0.2.0 (incubating), 0.2.1 
> (incubating)
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>             Fix For: 0.2.2 (incubating), 0.3.0 (incubating)
>
>         Attachments: normalization.patch
>
>
> Query normalization is supposed to scale all scores uniformly by a simple
> multiplier, but the child nodes in complex queries are presently getting
> "extra" normalization applied to them.  This has the effect of scaling
> different subqueries by different amounts, changing the balance of the
> subqueries within a complex query, interfering with IDF weighting and subtly
> degrading relevancy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to