[
https://issues.apache.org/jira/browse/LUCENE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated LUCENE-4050:
--------------------------------------
Issue Type: Bug (was: Improvement)
It's actually a bug - it's not possible to cleanly extend index format via
Codec-s without addressing this issue.
> Make segments_NN file codec-independent
> ---------------------------------------
>
> Key: LUCENE-4050
> URL: https://issues.apache.org/jira/browse/LUCENE-4050
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/codecs
> Reporter: Andrzej Bialecki
> Fix For: 4.0
>
>
> I propose to change the format of SegmentInfos file (segments_NN) to use
> plain text instead of the current binary format.
> SegmentInfos file represents a commit point, and it also declares what codecs
> were used for writing each of the segments that the commit point consists of.
> However, this is a chicken and egg situation - in theory the format of this
> file is customizable via Codec.getSegmentInfosFormat, but in practice we have
> to first discover what is the codec implementation that wrote this file - so
> the SegmentCoreReaders assumes a certain fixed binary layout of a preamble of
> this file that contains the codec name... and then the file is read again,
> only this time using the right Codec.
> This is ugly. Instead I propose to use a simple plain text format, either
> line oriented properties or JSON, in such a way that newer versions could
> easily extend it, and which wouldn't require any special Codec to read and
> parse. Consequently we could remove SegmentInfosFormat altogether, and
> instead add SegmentInfoFormat (notice the singular) to Codec to read single
> per-segment SegmentInfo-s in a codec-specific way. E.g. for Lucene40 codec we
> could either add another file or we could extend the .fnm file (FieldInfos)
> to contain also this information.
> Then the plain text SegmentInfos would contain just the following information:
> * list of global files for this commit point (if any)
> * list of segments for this commit point, and their corresponding codec class
> names
> * user data map
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]