Hey Varun, On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > Hi Varun, > > Those two issues would make a great GSoC! Comments below... +1 > > On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker > <varunthacker1...@gmail.com> wrote: > >> I would like to combine two tasks as part of my project >> namely-Directory createOutput and openInput should take an IOContext >> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to >> UnixDir (Lucene-2795). >> >> The first part of the project is aimed at significantly reducing time >> taken to search during indexing by adding an IOContext which would >> store buffer size and have options to bypass the OS’s buffer cache >> (This is what causes the slowdown in search ) and other hints. Once >> completed I would move on to Lucene-2795 and generalize the Directory >> implementation to make a UnixDirectory . > > So, the first part (LUCENE-2793) should cause no change at all to > performance, functionality, etc., because it's "merely" installing the > plumbing (IOContext threaded throughout the low-level store APIs in > Lucene) so that higher levels can send important details down to the > Directory. We'd fix IndexWriter/IndexReader to fill out this > IOContext with the details (merging, flushing, new reader, etc.). > > There's some fun/freedom here in figuring out just what details should > be included in IOContext... (eg: is it low level "set buffer size to 4 KB" > or is it high level "I am opening a new near-real-time reader"). > > This first step is a rote cutover, just changing APIs but in no way > taking advantage of the new APIs. > > The 2nd step (LUCENE-2795) would then take advantage of this plumbing, > by creating a UnixDir impl that, using JNI (C code), passes advanced > flags when opening files, based on the incoming IOContext. > > The goal is a single UnixDir that has ifdefs so that it's usable > across multiple Unices, and eg would use direct IO if the context is > merging. If we are ambitious we could rope Windows into the mix, too, > and then this would be NativeDir... > > We can measure success by validating that a big merge while searching > does not hurt search performance? (Ie we should be able to reproduce > the results from > http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).
Thanks for the summary mike! > >> I have spoken to Micheal McCandless and Simon Willnauer about >> undertaking these tasks. Micheal McCandless has agreed to mentor me . >> I would love to be able to contribute and learn from Apache Lucene >> community this summer. Also I would love suggestions on how to make my >> application proposal stronger. > > I think either Simon or I can be the "official" mentor, and then the > other one of us (and other Lucene committers) will support/chime > in... I will take the official responsibility here once we are there! simon > > This is an important change for Lucene! > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org