[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281105#comment-13281105
 ] 

Robert Muir commented on LUCENE-4055:
-------------------------------------

Just some updates from the work in the branch (scary changes but proceeding 
nicely since Mike jumped in and did a lot of it).
Here's a list of the current progress:

* on disk, the segments_N is reduced to the stuff that actually is per-commit: 
a list of segments and deleted gens/counts, etc.
* per-segment metadata (doc count, diagnostics, etc) that is write-once is 
encoded by the codec, e.g. for 4.0's codec this is in the .si file.
* removed backwards-seeking on segments_N. so appendingcodec still works but 
doesn't need any special hacks.
* flush/merge order is changed so that fieldinfos are written last so codecs 
have a chance to add metadata to it.
* fieldinfo has a "codec metadata" api that codec components can use, and that 
metadata will be available on reading the segment. this metadata 
  is for the codec to use to extend fieldinfo, its not carried along during 
merge or anything. 
* PerFieldPostingsFormat is changed to use the fieldinfo metadata api rather 
than a separate .per file (e.g. it records that the "id" field uses Pulsing).
* all the hairiness involving files() is really nice now, instead we simply 
track which files were created, and add them to the .si file. Previously
  there was a lot of logic to compute this in a symmetric way at both read and 
write time, and if you had a bug, your punishment was FNFE.

not yet done:
* add metadata api to segmentinfo too, so that codec components can record 
per-segment information that they care about.
* see if we can implement 3.x's shared doc stores support with segmentinfo 
metadata api. This is tricky to do and for addIndexes/indexSplitter etc which
  do sneaky things to still work.
* see if we can implement 3.x normGen (separate norms) with segmentinfo 
metadata. while in 3.x lucene this was actually per-commit, since 3.x support
  is read-only we can effectively treat it as per-segment this way.
* rename stuff so that we have a clearer naming for some of these classes.

I'm also probably missing a few other things. In general I'm pretty happy with 
the "metadata" key-value attributes api versus subclassing. 

I tried to make subclassing work, but subclassing turned really ugly fast and 
made various codec components too tightly-coupled, e.g. 
if someone wants to combine a CompressedStoredFields with a 
PerFieldPostingsFormat and SpecialTermVectors, what would the impls be :). 

So the overly simple Map<String,String> avoids these issues, and hey its just 
metadata after all so I don't think anything more complex is really needed. 

                
> Refactor SegmentInfo / FieldInfo to make them extensible
> --------------------------------------------------------
>
>                 Key: LUCENE-4055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4055
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>
> After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
> should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to