[
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088335#comment-13088335
]
Uwe Schindler commented on LUCENE-3218:
---------------------------------------
Hi when thinking about the whole stuff one more time again, I may have a
solution to again decouple CFS from the parent directory, so one can create any
CFS using one single class (but perhaps the factory in directory is still an
idea to make it customizable). There are several solutions, but most of them
have customization problems:
- The current approach was discussed already, nothing more to say
- A possibility to make it possible for MMap to map certain parts of the file
is to move the getIndexInputSlice up to the abstract Directory base class and
make the default implementation the current CFIndexInput from the default CFS
impl. This would be even backwards compatible. So the CFS impl can simply ask
the parent directory it warps for a slice. The problem here is easy: Current
CFS impl opens the CFS file exactly one time and consumes exactly one file
handle. The slices work on the same file handle. If we move the slice handling
up to the directory, the "state" is gone, so handling the all-the-time open CFS
file cannot be managed anymore. When using a new file handle for each slice, we
gain nothing (CFS is to reduce file handles).
- Last night I had one idea that might fix this issue. Lets move the slice
handling into the abstract IndexInput base class, again the default impl would
simply use the current CFIndexInput to return a slice. In the case of
MMapIndexInput it would simply return a remapped slice on the current file
handle. The only thing that would change is that the RAF would kept open the
wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is
closed directly after mapping. This approach would allow it for the CFS impl to
simply ask it parant directory for an IndexInput to handle the SFC file itsself
and for each sub-slice ask this IndexInput for this.
The last approach seems reasonable, but we need some more checks how to
implement that. The last approach keeps both "features" of CFS:
- One OS file handle
- possibility for certain directory implementations to return sliced
IndexInputs in an optimal way. The current IndexInput have a clone method, in
this case we would need a similar method, where you can give offset and length.
On the other hand, we can remove the "factory" for CFS files from directory, we
can go back to a simple new CFSDirectory(parentDirectory, cfsName).
Does this sound reasonable?
> Make CFS appendable
> ---------------------
>
> Key: LUCENE-3218
> URL: https://issues.apache.org/jira/browse/LUCENE-3218
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Affects Versions: 3.4, 4.0
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch,
> LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch,
> LUCENE-3218_tests.patch
>
>
> Currently CFS is created once all files are written during a flush / merge.
> Once on disk the files are copied into the CFS format which is basically a
> unnecessary for some of the files. We can at any time write at least one file
> directly into the CFS which can save a reasonable amount of IO. For instance
> stored fields could be written directly during indexing and during a Codec
> Flush one of the written files can be appended directly. This optimization is
> a nice sideeffect for lucene indexing itself but more important for DocValues
> and LUCENE-3216 we could transparently pack per field files into a single
> file only for docvalues without changing any code once LUCENE-3216 is
> resolved.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]