[ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049335#comment-13049335
 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

Hi Robert, great patch, exactly as I would have wished to have it when we 
discussed about it!

Patch looks file, small bug:
- FileSwitchDirectory should also override the openCompoundInput() from 
Directory and delegate to the correct underlying directory. Now it always uses 
the default impl, which is double buffering. So if you e.g. put MMapDirectory 
as a delegate for CFS files, those files would be opened like before your 
patch. Just copy'n'paste the code from one of the other FileSwitchDirectory 
methods.

Some suggestions:
We currently map the whole compound file into address space, read the 
header/contents and unmap it again. This may be some overhead especially if 
unmapping is not supported.
- We could use SimpleFSIndexInput to read CFS contents (we only need to pass 
the already open RAF there, alternatively use Dawids new wrapper IndexInput 
around a standard InputStream, got from RAF -> LUCENE-3202)
- Only map the header of the CFS file, the problem: we dont know exact size.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following 
> problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for 
> directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of 
> compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would 
> just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of 
> course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory 
> could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it 
> wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize 
> how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a 
> Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it 
> expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to