norms file can become unexpectedly enormous -------------------------------------------
Key: LUCENE-830 URL: https://issues.apache.org/jira/browse/LUCENE-830 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.1 Reporter: Michael McCandless Priority: Minor Spinoff from this user thread: http://www.gossamer-threads.com/lists/lucene/java-user/46754 Norms are not stored sparsely, so even if a doc doesn't have field X we still use up 1 byte in the norms file (and in memory when that field is searched) for that segment. I think this is done for performance at search time? For indexes that have a large # documents where each document can have wildly varying fields, each segment will use # documents times # fields seen in that segment. When optimize merges all segments, that product grows multiplicatively so the norms file for the single segment will require far more storage than the sum of all previous segments' norm files. I think it's uncommon to have a huge number of distinct fields (?) so we would need a solution that doesn't hurt the more common case where most documents have the same fields. Maybe something analogous to how bitvectors are now optionally stored sparsely? One simple workaround is to disable norms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]