[
https://issues.apache.org/jira/browse/LUCENE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962746#comment-13962746
]
Michael McCandless commented on LUCENE-5578:
--------------------------------------------
I like the new BaseIndexFormatTestCase!
The javadocs for BaseIndexFormatTestCase.extensions says "Return the
file name extensions used by this stored fields format" but it should
be "by this codec" (it's not just stored fields).
Maybe rename fileLengths to "bytesUsedByExtension"? (It's summing up
by extension). And rename "expectedChecksums" to "expectedBytesUsed"
or something?
Instead of requiring the test case to report all extensions it writes
to (seems error prone?), can't we just list the directory and collate
by extension ourselves? Or, if there are extensions (which are they?)
that won't be the same size after indexing in two different ways
... maybe we invert the method and just return those ones?
> Stored fields might accumulate checksums on merges
> --------------------------------------------------
>
> Key: LUCENE-5578
> URL: https://issues.apache.org/jira/browse/LUCENE-5578
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Blocker
> Fix For: 4.8
>
> Attachments: LUCENE-5578.patch, LUCENE-5578.patch, LUCENE-5578.patch
>
>
> The bulk merge operation of our stored fields format is optimized in order to
> avoid decompressing data when not needed. In order to know the offset of the
> end of the current block, it either consults the stored fields index, or uses
> {{fieldsStream.length()}} for the last chunk.
> However, we just added checksums at the end of index files, so it might
> currently copy the current checksum in addition to the last chunk, and then
> write a new checksum.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]