[
https://issues.apache.org/jira/browse/LUCENE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-5969:
--------------------------------
Attachment: LUCENE-5969.patch
Here is an initial patch.
I already refactored tests so to bump the default codec/PF/DVF its much easier:
you just change methods in TestUtil.
I also added a check to TestAllFilesHaveCodecHeader to look for duplicate codec
names (its commented out until we fix TVF).
In this patch, I added new infos formats (.SI and .FNM) that don't have all the
confusing backwards version stuff. The fnm reader and writer (and checkindex)
hard-check fieldinfos consistency on both read and write.
Also checkindex got a little cleanup, so that "foreign" readers
(TestUtil.checkReader) get fieldinfos and livedocs validation, whereas they did
not before.
I added CodecUtil.checkFooter(input, Throwable) to give better exceptions when
things are corrupt (e.g., it adds suppressed exception for checksum status),
and cut over .SI/.FNM/.NVM/.DVM to use it. I also added standalone tests for
this.
I want to cutover other parts too (like .FDX, .TVX, ...) but we shouldnt use
this method until we remove all the conditional versioning and cut "clean"
versions (also without bogus codec ids), otherwise I think its confusing and
potentially unsafe.
However, I think we should start with this, to unblock Mike's cleanup of SI
version handling and other work? I dont think we have to write 5.0's format in
one day.
After we are happy with 5.0 format, we can then cleanup the back compat (trunk
doesnt need all the 4.x back compat, etc).
> Add Lucene50Codec
> -----------------
>
> Key: LUCENE-5969
> URL: https://issues.apache.org/jira/browse/LUCENE-5969
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: 5.0, 6.0
>
> Attachments: LUCENE-5969.patch
>
>
> Spinoff from LUCENE-5952:
> * Fix .si to write Version as 3 ints, not a String that requires parsing at
> read time.
> * Lucene42TermVectorsFormat should not use the same codecName as
> Lucene41StoredFieldsFormat
> It would also be nice if we had a "bumpCodecVersion" script so rolling a new
> codec is not so daunting.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]