[ 
https://issues.apache.org/jira/browse/LUCENE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5969:
--------------------------------
    Attachment: LUCENE-5969.patch

Here is an initial patch.

I already refactored tests so to bump the default codec/PF/DVF its much easier: 
you just change methods in TestUtil.

I also added a check to TestAllFilesHaveCodecHeader to look for duplicate codec 
names (its commented out until we fix TVF). 

In this patch, I added new infos formats (.SI and .FNM) that don't have all the 
confusing backwards version stuff. The fnm reader and writer (and checkindex) 
hard-check fieldinfos consistency on both read and write.

Also checkindex got a little cleanup, so that "foreign" readers 
(TestUtil.checkReader) get fieldinfos and livedocs validation, whereas they did 
not before.

I added CodecUtil.checkFooter(input, Throwable) to give better exceptions when 
things are corrupt (e.g., it adds suppressed exception for checksum status), 
and cut over .SI/.FNM/.NVM/.DVM to use it. I also added standalone tests for 
this.

I want to cutover other parts too (like .FDX, .TVX, ...) but we shouldnt use 
this method until we remove all the conditional versioning and cut "clean" 
versions (also without bogus codec ids), otherwise I think its confusing and 
potentially unsafe.

However, I think we should start with this, to unblock Mike's cleanup of SI 
version handling and other work? I dont think we have to write 5.0's format in 
one day. 

After we are happy with 5.0 format, we can then cleanup the back compat (trunk 
doesnt need all the 4.x back compat, etc).

> Add Lucene50Codec
> -----------------
>
>                 Key: LUCENE-5969
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5969
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 5.0, 6.0
>
>         Attachments: LUCENE-5969.patch
>
>
> Spinoff from LUCENE-5952:
>   * Fix .si to write Version as 3 ints, not a String that requires parsing at 
> read time.
>   * Lucene42TermVectorsFormat should not use the same codecName as 
> Lucene41StoredFieldsFormat
> It would also be nice if we had a "bumpCodecVersion" script so rolling a new 
> codec is not so daunting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to