[jira] Commented: (LUCENE-2789) Let codec decide to use compound file system or not

Simon Willnauer (JIRA) Thu, 02 Dec 2010 04:50:37 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966094#action_12966094
 ]


Simon Willnauer commented on LUCENE-2789:
-----------------------------------------

bq. We'll still need to copypaste CFS handling code to each new Codec :/
I don't think that is true at all. Since adding stuff to a CFS is done 
afterwards we can implement that in a basic class same is true for reading from 
it really. 

bq. I'd like to a see a switch like setNeverEverUseCompoundFiles(true) 
somewhere.
if we push it to codec you can just write your own CodecProvider that disables 
it on all codecs.

bq. Can we move the CFS code outside and the codec simply calls a 
class/component/whatever during merging and say: I have these files and want to 
create a CFS out of it? For reading something similar.
that is essentially what I think we need to do.

> Let codec decide to use compound file system or not
> ---------------------------------------------------
>
>                 Key: LUCENE-2789
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2789
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Codecs, Index
>            Reporter: Simon Willnauer
>
> While working on LUCENE-2186  and in the context of recent [mails | 
> http://www.lucidimagination.com/search/document/e75cfa6050d5176/consolidate_mp_and_lmp#97c69a198952ebaa]
>  about consolidating MergePolicy and LogMergePolicy I wanna propose a rather 
> big change how Compund Files are created / handled in IW. Since Codecs have 
> been introduced we have several somewhat different way of how data is written 
> to the index. Sep codec for instance writes different files for index data 
> and DocValues will write one file per field and segment. Eventually codecs 
> need to have more control over how files are written ie. if CFS should be 
> used or not is IMO really  a matter of the codec used for writing.
> On the other hand when you look at IW internals CFS really pollutes the 
> indexing code and relies on information from inside a codec (see 
> SegmentWriteState.flusedFiles) actuall this differentiation spreads across 
> many classes related to indexing including the LogMergePolicy. IMO how new 
> flushed segments are written has nothing to do with MP in the first place and 
> MP currently choses whether a newly flushed segment is CFS or not (correct me 
> if I am wrong), pushing all this logic down to codecs would make lots of code 
> much easier and cleaner.
> As mike said this would also reduce the API footprint if we make it private 
> to the codec. I can imagine some situations where you really want control 
> over certain fields to be stored as non-CFS and other to be stored as CFS.  
> Codecs might need more information about other segments during a merge to 
> decide if or not to use CFS based on the segments size but we can easily 
> change that API. From a reading point of view we already have Codec#files 
> that can decide case by case what files belong to this codec.
> let me know the thoughts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2789) Let codec decide to use compound file system or not

Reply via email to