[
https://issues.apache.org/jira/browse/LUCENE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158804#comment-13158804
]
Uwe Schindler commented on LUCENE-3607:
---------------------------------------
+1 to use a codec in 4.0.
In general it's not enough to only make the Codec write the files without
timestamps and versions and diagnostic data. It's also important that you use a
MergeScheduler and MergePolicy that does predictable segment merges. To make an
index that can be reproduced (except the version numbers and timestamps), you
need to use SerialMergeScheduler (as ConcurrentMergeScheduler uses multiple
threads and parallelization might change the order and how early merges occur)
and also use a MergePolicy that does not merge depending on platform features.
So use good old LogDocMergePolicy (its merges aacording to numer of docs). And
finally your Analyzers must produce predictable results even when the
underlying Java Runtime with its Unicode version changes (use ICU analyzers).
> Lucene Index files can not be reproduced faithfully (due to timestamps
> embedded)
> --------------------------------------------------------------------------------
>
> Key: LUCENE-3607
> URL: https://issues.apache.org/jira/browse/LUCENE-3607
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 2.9.1
> Environment: Eclipse 3.7
> Reporter: Martin Oberhuber
> Assignee: Michael McCandless
>
> Eclipse 3.7 uses Lucene 2.9.1 for indexing online help content. A
> pre-generated help index can be shipped together with online content. As per
> [[https://bugs.eclipse.org/bugs/show_bug.cgi?id=364979 ]]
> it turns out that the help index can not be faithfully reproduced during a
> build, because there are timestamps embedded in the index files, and the
> "NameCounter" field in segments_2 contains different contents on every build.
> Not being able to faithfully reproduce the index from identical source bits
> undermines trust in the index (and software delivery) being correct.
> I'm wondering whether this is a known issue and/or has been addressed in a
> newer Lucene version already ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]