[ https://issues.apache.org/jira/browse/LUCENE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480520 ]
Doron Cohen commented on LUCENE-830: ------------------------------------ > One simple workaround is to disable norms. You mean for some of the fields, using Fieldable's setOmitNorms(). For large indexes, I would think that most fields would be indexed with omit=true, except for one (content) or two (subject?) fields were length normalization and/or boosting are of importance. in such cases there would not really be a problem. Consider the example that an index created for adding textual search to a database application, by mapping the index field names to the database "textual columns" names; if more than one table is indexed, but the textual column name happens to be different between the tables, then yes, - with that straightforward mapping there would be a waste - lots of unused bytes. One work around for such applications could be to map the textual columns of all tables to a single textual field in Lucene, thuogh then they would have to filter by a table-name field (which they might do anyhow). > norms file can become unexpectedly enormous > ------------------------------------------- > > Key: LUCENE-830 > URL: https://issues.apache.org/jira/browse/LUCENE-830 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.1 > Reporter: Michael McCandless > Priority: Minor > > Spinoff from this user thread: > http://www.gossamer-threads.com/lists/lucene/java-user/46754 > Norms are not stored sparsely, so even if a doc doesn't have field X > we still use up 1 byte in the norms file (and in memory when that > field is searched) for that segment. I think this is done for > performance at search time? > For indexes that have a large # documents where each document can have > wildly varying fields, each segment will use # documents times # fields > seen in that segment. When optimize merges all segments, that product > grows multiplicatively so the norms file for the single segment will > require far more storage than the sum of all previous segments' norm > files. > I think it's uncommon to have a huge number of distinct fields (?) so > we would need a solution that doesn't hurt the more common case where > most documents have the same fields. Maybe something analogous to how > bitvectors are now optionally stored sparsely? > One simple workaround is to disable norms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]