[
https://issues.apache.org/jira/browse/LUCENE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-830.
---------------------------------------
Resolution: Fixed
Fix Version/s: 4.0
As of 4.0, when norms are missing we drop norms for the entire field, unlike
before when we invent a fake norm for documents missing that field or omitting
norm for it.
Also, as of 4.0, you can now make a custom norm provider and custom similarity
so if you really want to it's possible (in theory!) to have a sparse norms data
structure...
> norms file can become unexpectedly enormous
> -------------------------------------------
>
> Key: LUCENE-830
> URL: https://issues.apache.org/jira/browse/LUCENE-830
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 2.1
> Reporter: Michael McCandless
> Priority: Minor
> Fix For: 4.0
>
>
> Spinoff from this user thread:
> http://www.gossamer-threads.com/lists/lucene/java-user/46754
> Norms are not stored sparsely, so even if a doc doesn't have field X
> we still use up 1 byte in the norms file (and in memory when that
> field is searched) for that segment. I think this is done for
> performance at search time?
> For indexes that have a large # documents where each document can have
> wildly varying fields, each segment will use # documents times # fields
> seen in that segment. When optimize merges all segments, that product
> grows multiplicatively so the norms file for the single segment will
> require far more storage than the sum of all previous segments' norm
> files.
> I think it's uncommon to have a huge number of distinct fields (?) so
> we would need a solution that doesn't hurt the more common case where
> most documents have the same fields. Maybe something analogous to how
> bitvectors are now optionally stored sparsely?
> One simple workaround is to disable norms.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]