[
https://issues.apache.org/jira/browse/LUCENE-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-756:
---------------------------------------
I would like to propose some small improvements to this nice feature.
I've worked out a patch (will attach shortly). Doron if you agree /
or we can iterate then I'll commit it! Thanks.
Proposed changes:
* Renamed "withNrm()" to "getHasMergedNorms" to be more
descriptive. Also changed the field to "hasMergedNorms".
* Explicitly store "hasMergedNorms" in the segments_N file.
I think in general we should favor storing things like this
explicitly instead of relying on IO operations (fileExists).
We've made great progress lately in reducing such IO operations so
I'd like to keep that up when possible :)
I created a new FORMAT_MERGED_NORMS in SegmentInfos for this. The
change is fully backwards compatible (old indices work fine). I
extended TestBackwardsCompatibility to test this.
This then has the nice side effect of not having to create the
fleeting CompoundFileReader in "SegmentInfo.getHasMergedNorms"
(which was somewhat spooky to me) for indices written to after
this is committed. For indices written to before this gets
committed but after the first version was committed (10 days ago),
the check is still needed so I've left it in there with a comment.
* Fixed the TestDoc unit test to actually create & return
SegmentInfo's vs recreating a new SegmentInfo every time (which
causes problems whenever we add something to SegmentInfo). This
is still a correct test but more scalable with time as we make
changes to SegmentInfo.
> Maintain norms in a single file .nrm
> ------------------------------------
>
> Key: LUCENE-756
> URL: https://issues.apache.org/jira/browse/LUCENE-756
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Doron Cohen
> Assigned To: Doron Cohen
> Priority: Minor
> Attachments: index.premergednorms.cfs.zip,
> index.premergednorms.nocfs.zip, LUCENE-756-Jan16.patch, nrm.patch.2.txt,
> nrm.patch.3.txt, nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity
> comparing to compound indexes. But their file descriptors foot print is much
> higher.
> By maintaining all field norms in a single .nrm file, we can bound the number
> of files used by non compound indexes, and possibly allow more applications
> to use this format.
> More details on the motivation for this in:
> http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html
> (in particular
> http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]