[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046635#comment-13046635 ]
Michael McCandless commented on LUCENE-2793: -------------------------------------------- bq. It seems that we don't need to provide IOContext to FieldInfos and SegmentInfo since we are reading them into memory anyway. I think you can just use a default context here without changing the constructors. Same is true for SegmentInfo I think we should pass down "readOnce=true" for these cases? EG some kind of caching dir (or something) would know not to bother caching such files... Same for del docs, terms index, doc values (well, sometimes), etc. bq. it seems that we should communicate the IOContext to the codec somehow. I suggest we put IOContext to SegmentWriteState and SegmentReadState that way we don't need to change the Codec interface and clutter it with internals. This would also fix mikes comment for FieldsConsumer etc. +1 that's great. bq. I really don't like OneMerge I think we should add an abstract class (maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now. If we can't include OneMerge, and I agree it'd be nice not to, I think we should try hard to pull stuff out of OneMerge that may be of interest to a Dir impl? Maybe: * estimatedTotalSegmentSizeBytes * docCount * optimize/expungeDeletes * isExternal (so Dir can know if this is addIndexes vs "normal" merging) bq. Regarding the IOContext class I think we should design for what we have right now and since SegementInfo is not used anywhere (as far as I can see) we should add it once we need it. OneMerge should not go in there but rather the interface / abstract class I talked about above. I agree, let's wait until we have a need. In fact... SegmentInfo for flush won't work: we go and open all files for flushing, write to them, close them, and only then do we make the SegmentInfo. So it seems like we should also have some abtracted stuff about the to-be-flushed segment? Maybe for starters the estimatedSegmentSizeBytes? EG, NRTCachingDir could use this to decide whether to cache the new segment (today it fragile-ly relies on the app to open new NRT reader frequently enough). > Directory createOutput and openInput should take an IOContext > ------------------------------------------------------------- > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store > Reporter: Michael McCandless > Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org