[ 
https://issues.apache.org/jira/browse/LUCENE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3728:
--------------------------------

    Attachment: LUCENE-3728.patch

Attached is a patch showing the differences between branch and trunk.

* separateFiles() is removed, codecs calculate all of files() themselves.
* shared (compound) doc store stuff is almost totally underneath Lucene3xCodec. 
(with one exception, see below)
* indexing code doesn't violate abstractions of the code by reaching into the 
different components of files(), except indexwriter.copySegmentAsIs (and only 
for preflex, only in the case of shared doc stores).
* lucene3x/lucene4x stored fields implementation is split, so that lucene3x can 
handle shared docstores itself.
* codec.files is consistent about directory, in fact it doesnt take directory 
at all, as its unnecessary.
* hairy code from segmentcorereaders for compound shared docstores is removed, 
and only in the 3.x impls.

I think this is ready to go, and a lot easier to see whats happening with 
files(). 

In my opinion a good future issue here would be to add a heuristic to TieredMP 
to more aggressively target 3.x segments (at least if they have shared 
docstores). It could do evil things like check File.length or whatever it wants 
as part of its heuristic, but only for preflexcodec.
                
> better handling of files inside/outside CFS by codec
> ----------------------------------------------------
>
>                 Key: LUCENE-3728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3728
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3728.patch
>
>
> Since norms and deletes were moved under Codec (LUCENE-3606, LUCENE-3661),
> we never really properly addressed the issue of how Codec.files() should work,
> considering these files are always stored outside of CFS.
> LUCENE-3606 added a hack, LUCENE-3661 cleaned up the hack a little bit more,
> but its still a hack.
> Currently the logic in SegmentInfo.files() is:
> {code}
> clearCache()
> if (compoundFile) {
>   // don't call Codec.files(), hardcoded CFS extensions, etc
> } else {
>   Codec.files()
> }
> // always add files stored outside CFS regardless of CFS setting
> Codec.separateFiles()
> if (sharedDocStores) {
>   // hardcoded shared doc store extensions, etc
> }
> {code}
> Also various codec methods take a Directory parameter, but its inconsistent
> what this Directory is in the case of CFS: for some parts of the index its
> the CFS directory, for others (deletes, separate norms) its not.
> I wonder if instead we could restructure this so that SegmentInfo.files() 
> logic is:
> {code}
> clearCache()
> Codec.files()
> {code}
> and so that Codec is instead responsible.
> instead Codec.files logic by default would do the if (compoundFile) thing, and
> Lucene3x codec itself would only have the if (sharedDocStores) thing, and any
> part of the codec that wants to put stuff always outside of CFS (e.g. 
> Lucene3x separate norms, deletes) 
> could just use SegmentInfo.dir. Directory parameters in the case of CFS would 
> always
> consistently be the CFSDirectory.
> I haven't totally tested if this will work but there is definitely some 
> cleanups 
> we can do either way, and I think it would be a good step to try to clean 
> this up
> and simplify it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to