I have refined my proposal here : http://goo.gl/uYXrV
Are there any suggestions for which I need to update my proposal before today's deadline . On Thu, Apr 7, 2011 at 9:28 AM, Varun Thacker <varunthacker1...@gmail.com>wrote: > I have updated my proposal online to mention the time I would be able to > dedicate to the project . > > > On Thu, Apr 7, 2011 at 7:05 AM, Adriano Crestani < > adrianocrest...@gmail.com> wrote: > >> Hi Varun, >> >> Nice proposal, very complete. Only one thing missing, you should mention >> somewhere how many hours a week you are willing to spend working on the >> project and whether there is any holiday you won't be able to work. >> >> Good luck ;) >> >> >> On Wed, Apr 6, 2011 at 5:57 PM, Varun Thacker <varunthacker1...@gmail.com >> > wrote: >> >>> I have drafted the proposal on the official GSoC website . This is the >>> link to my proposal http://goo.gl/uYXrV . Please do let me know if >>> anything needs to be changed ,added or removed. >>> >>> I will keep on working on it till the deadline on the 8th. >>> >>> On Wed, Apr 6, 2011 at 11:41 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>>> That test code looks good -- you really should have seen awful >>>> performance had you used O_DIRECT since you read byte by byte. >>>> >>>> A more realistic test is to read a whole buffer (eg 4 KB is what >>>> Lucene now uses during merging, but we'd probably up this to like 1 MB >>>> when using O_DIRECT). >>>> >>>> Linus does hate O_DIRECT (see http://kerneltrap.org/node/7563), and >>>> for good reason: its existence means projects like ours can use it to >>>> "work around" limitations in the Linux IO apis that control the buffer >>>> cache when, otherwise, we might conceivably make patches to fix Linux >>>> correctly. It's an escape hatch, and we all use the escape hatch >>>> instead of trying to fix Linux for real... >>>> >>>> For example the NOREUSE flag is a no-op now in Linux, which is a >>>> shame, because that's precisely the flag we'd want to use for merging >>>> (along with SEQUENTIAL). Had that flag been implemented well, it'd >>>> give better results than our workaround using O_DIRECT. >>>> >>>> Anyway, giving how things are, until we can get more control (waaaay >>>> up in Javaland) over the buffer cache, O_DIRECT (via native directory >>>> impl through JNI) is our only real option, today. >>>> >>>> More details here: >>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html >>>> >>>> Note that other OSs likely do a better job and actually implement >>>> NOREUSE, and similar APIs, so the generic Unix/WindowsNativeDirectory >>>> would simply use NOREUSE on these platforms for I/O during segment >>>> merging. >>>> >>>> Mike >>>> >>>> http://blog.mikemccandless.com >>>> >>>> On Wed, Apr 6, 2011 at 11:56 AM, Varun Thacker >>>> <varunthacker1...@gmail.com> wrote: >>>> > Hi. I wrote a sample code to test out speed difference between >>>> SEQUENTIAL >>>> > and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads . >>>> > >>>> > This is the link to the code: http://pastebin.com/8QywKGyS >>>> > >>>> > There was a speed difference which when i switched between the two >>>> flags. I >>>> > have not used the O_DIRECT flag because Linus had criticized it. >>>> > >>>> > Is this what the flags are intended to be used for ? This is just a >>>> sample >>>> > code with a test file . >>>> > >>>> > On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer >>>> > <simon.willna...@googlemail.com> wrote: >>>> >> Hey Varun, >>>> >> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless >>>> >> <luc...@mikemccandless.com> wrote: >>>> >>> Hi Varun, >>>> >>> >>>> >>> Those two issues would make a great GSoC! Comments below... >>>> >> +1 >>>> >>> >>>> >>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker >>>> >>> <varunthacker1...@gmail.com> wrote: >>>> >>> >>>> >>>> I would like to combine two tasks as part of my project >>>> >>>> namely-Directory createOutput and openInput should take an >>>> IOContext >>>> >>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to >>>> >>>> UnixDir (Lucene-2795). >>>> >>>> >>>> >>>> The first part of the project is aimed at significantly reducing >>>> time >>>> >>>> taken to search during indexing by adding an IOContext which would >>>> >>>> store buffer size and have options to bypass the OS’s buffer cache >>>> >>>> (This is what causes the slowdown in search ) and other hints. Once >>>> >>>> completed I would move on to Lucene-2795 and generalize the >>>> Directory >>>> >>>> implementation to make a UnixDirectory . >>>> >>> >>>> >>> So, the first part (LUCENE-2793) should cause no change at all to >>>> >>> performance, functionality, etc., because it's "merely" installing >>>> the >>>> >>> plumbing (IOContext threaded throughout the low-level store APIs in >>>> >>> Lucene) so that higher levels can send important details down to the >>>> >>> Directory. We'd fix IndexWriter/IndexReader to fill out this >>>> >>> IOContext with the details (merging, flushing, new reader, etc.). >>>> >>> >>>> >>> There's some fun/freedom here in figuring out just what details >>>> should >>>> >>> be included in IOContext... (eg: is it low level "set buffer size to >>>> 4 >>>> >>> KB" >>>> >>> or is it high level "I am opening a new near-real-time reader"). >>>> >>> >>>> >>> This first step is a rote cutover, just changing APIs but in no way >>>> >>> taking advantage of the new APIs. >>>> >>> >>>> >>> The 2nd step (LUCENE-2795) would then take advantage of this >>>> plumbing, >>>> >>> by creating a UnixDir impl that, using JNI (C code), passes advanced >>>> >>> flags when opening files, based on the incoming IOContext. >>>> >>> >>>> >>> The goal is a single UnixDir that has ifdefs so that it's usable >>>> >>> across multiple Unices, and eg would use direct IO if the context is >>>> >>> merging. If we are ambitious we could rope Windows into the mix, >>>> too, >>>> >>> and then this would be NativeDir... >>>> >>> >>>> >>> We can measure success by validating that a big merge while >>>> searching >>>> >>> does not hurt search performance? (Ie we should be able to >>>> reproduce >>>> >>> the results from >>>> >>> >>>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html). >>>> >> >>>> >> Thanks for the summary mike! >>>> >>> >>>> >>>> I have spoken to Micheal McCandless and Simon Willnauer about >>>> >>>> undertaking these tasks. Micheal McCandless has agreed to mentor me >>>> . >>>> >>>> I would love to be able to contribute and learn from Apache Lucene >>>> >>>> community this summer. Also I would love suggestions on how to make >>>> my >>>> >>>> application proposal stronger. >>>> >>> >>>> >>> I think either Simon or I can be the "official" mentor, and then the >>>> >>> other one of us (and other Lucene committers) will support/chime >>>> >>> in... >>>> >> >>>> >> I will take the official responsibility here once we are there! >>>> >> simon >>>> >>> >>>> >>> This is an important change for Lucene! >>>> >>> >>>> >>> Mike >>>> >>> >>>> >>> >>>> --------------------------------------------------------------------- >>>> >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >>>> >>> >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > >>>> > >>>> > Regards, >>>> > Varun Thacker >>>> > http://varunthacker.wordpress.com >>>> > >>>> > >>>> > >>>> > >>>> >>> >>> >>> >>> -- >>> >>> >>> Regards, >>> Varun Thacker >>> http://varunthacker.wordpress.com >>> >>> >>> >> > > > -- > > > Regards, > Varun Thacker > http://varunthacker.wordpress.com > > > -- Regards, Varun Thacker http://varunthacker.wordpress.com