I have drafted the proposal on the official GSoC website . This is the link to my proposal http://goo.gl/uYXrV . Please do let me know if anything needs to be changed ,added or removed.
I will keep on working on it till the deadline on the 8th. On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > That test code looks good -- you really should have seen awful > performance had you used O_DIRECT since you read byte by byte. > > A more realistic test is to read a whole buffer (eg 4 KB is what > Lucene now uses during merging, but we'd probably up this to like 1 MB > when using O_DIRECT). > > Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and > for good reason: its existence means projects like ours can use it to > "work around" limitations in the Linux IO apis that control the buffer > cache when, otherwise, we might conceivably make patches to fix Linux > correctly. It's an escape hatch, and we all use the escape hatch > instead of trying to fix Linux for real... > > For example the NOREUSE flag is a no-op now in Linux, which is a > shame, because that's precisely the flag we'd want to use for merging > (along with SEQUENTIAL). Had that flag been implemented well, it'd > give better results than our workaround using O_DIRECT. > > Anyway, giving how things are, until we can get more control (waaaay > up in Javaland) over the buffer cache, O_DIRECT (via native directory > impl through JNI) is our only real option, today. > > More details here: > http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html > > Note that other OSs likely do a better job and actually implement > NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory > would simply use NOREUSE on these platforms for I/O during segment > merging. > > Mike > > http://blog.mikemccandless.com > > On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker > <varunthacker1...@gmail.com> wrote: > > Hi. I wrote a sample code to test out speed difference between SEQUENTIAL > > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . > > > > This is the link to the code: http://pastebin.com/8QywKGyS > > > > There was a speed difference which when i switched between the two flags. > I > > have not used the O_DIRECT flag because Linus had criticized it. > > > > Is this what the flags are intended to be used for ? This is just a > sample > > code with a test file . > > > > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer > > <simon.willna...@googlemail.com> wrote: > >> Hey Varun, > >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless > >> <luc...@mikemccandless.com> wrote: > >>> Hi Varun, > >>> > >>> Those two issues would make a great GSoC! Comments below... > >> +1 > >>> > >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker > >>> <varunthacker1...@gmail.com> wrote: > >>> > >>>> I would like to combine two tasks as part of my project > >>>> namely-Directory createOutput and openInput should take an IOContext > >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to > >>>> UnixDir (Lucene-2795). > >>>> > >>>> The first part of the project is aimed at significantly reducing time > >>>> taken to search during indexing by adding an IOContext which would > >>>> store buffer size and have options to bypass the OS’s buffer cache > >>>> (This is what causes the slowdown in search ) and other hints. Once > >>>> completed I would move on to Lucene-2795 and generalize the Directory > >>>> implementation to make a UnixDirectory . > >>> > >>> So, the first part (LUCENE-2793) should cause no change at all to > >>> performance, functionality, etc., because it's "merely" installing the > >>> plumbing (IOContext threaded throughout the low-level store APIs in > >>> Lucene) so that higher levels can send important details down to the > >>> Directory. We'd fix IndexWriter/IndexReader to fill out this > >>> IOContext with the details (merging, flushing, new reader, etc.). > >>> > >>> There's some fun/freedom here in figuring out just what details should > >>> be included in IOContext... (eg: is it low level "set buffer size to 4 > >>> KB" > >>> or is it high level "I am opening a new near-real-time reader"). > >>> > >>> This first step is a rote cutover, just changing APIs but in no way > >>> taking advantage of the new APIs. > >>> > >>> The 2nd step (LUCENE-2795) would then take advantage of this plumbing, > >>> by creating a UnixDir impl that, using JNI (C code), passes advanced > >>> flags when opening files, based on the incoming IOContext. > >>> > >>> The goal is a single UnixDir that has ifdefs so that it's usable > >>> across multiple Unices, and eg would use direct IO if the context is > >>> merging. If we are ambitious we could rope Windows into the mix, too, > >>> and then this would be NativeDir... > >>> > >>> We can measure success by validating that a big merge while searching > >>> does not hurt search performance? (Ie we should be able to reproduce > >>> the results from > >>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html > ). > >> > >> Thanks for the summary mike! > >>> > >>>> I have spoken to Micheal McCandless and Simon Willnauer about > >>>> undertaking these tasks. Micheal McCandless has agreed to mentor me . > >>>> I would love to be able to contribute and learn from Apache Lucene > >>>> community this summer. Also I would love suggestions on how to make my > >>>> application proposal stronger. > >>> > >>> I think either Simon or I can be the "official" mentor, and then the > >>> other one of us (and other Lucene committers) will support/chime > >>> in... > >> > >> I will take the official responsibility here once we are there! > >> simon > >>> > >>> This is an important change for Lucene! > >>> > >>> Mike > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: dev-h...@lucene.apache.org > >>> > >>> > >> > > > > > > > > -- > > > > > > Regards, > > Varun Thacker > > http://varunthacker.wordpress.com > > > > > > > > > -- Regards, Varun Thacker http://varunthacker.wordpress.com