[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088335#comment-13088335
 ] 

Uwe Schindler commented on LUCENE-3218:
---------------------------------------

Hi when thinking about the whole stuff one more time again, I may have a 
solution to again decouple CFS from the parent directory, so one can create any 
CFS using one single class (but perhaps the factory in directory is still an 
idea to make it customizable). There are several solutions, but most of them 
have customization problems:
- The current approach was discussed already, nothing more to say
- A possibility to make it possible for MMap to map certain parts of the file 
is to move the getIndexInputSlice up to the abstract Directory base class and 
make the default implementation the current CFIndexInput from the default CFS 
impl. This would be even backwards compatible. So the CFS impl can simply ask 
the parent directory it warps for a slice. The problem here is easy: Current 
CFS impl opens the CFS file exactly one time and consumes exactly one file 
handle. The slices work on the same file handle. If we move the slice handling 
up to the directory, the "state" is gone, so handling the all-the-time open CFS 
file cannot be managed anymore. When using a new file handle for each slice, we 
gain nothing (CFS is to reduce file handles).
- Last night I had one idea that might fix this issue. Lets move the slice 
handling into the abstract IndexInput base class, again the default impl would 
simply use the current CFIndexInput to return a slice. In the case of 
MMapIndexInput it would simply return a remapped slice on the current file 
handle. The only thing that would change is that the RAF would kept open the 
wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is 
closed directly after mapping. This approach would allow it for the CFS impl to 
simply ask it parant directory for an IndexInput to handle the SFC file itsself 
and for each sub-slice ask this IndexInput for this.

The last approach seems reasonable, but we need some more checks how to 
implement that. The last approach keeps both "features" of CFS:
- One OS file handle
- possibility for certain directory implementations to return sliced 
IndexInputs in an optimal way. The current IndexInput have a clone method, in 
this case we would need a similar method, where you can give offset and length.

On the other hand, we can remove the "factory" for CFS files from directory, we 
can go back to a simple new CFSDirectory(parentDirectory, cfsName).

Does this sound reasonable?

> Make CFS appendable  
> ---------------------
>
>                 Key: LUCENE-3218
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3218
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 3.4, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
> LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
> LUCENE-3218_tests.patch
>
>
> Currently CFS is created once all files are written during a flush / merge. 
> Once on disk the files are copied into the CFS format which is basically a 
> unnecessary for some of the files. We can at any time write at least one file 
> directly into the CFS which can save a reasonable amount of IO. For instance 
> stored fields could be written directly during indexing and during a Codec 
> Flush one of the written files can be appended directly. This optimization is 
> a nice sideeffect for lucene indexing itself but more important for DocValues 
> and LUCENE-3216 we could transparently pack per field files into a single 
> file only for docvalues without changing any code once LUCENE-3216 is 
> resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to