Re: My GSOC proposal

Varun Thacker Wed, 06 Apr 2011 08:56:59 -0700

 Hi. I wrote a sample code to test out speed difference between SEQUENTIAL
and O_DIRECT( I used the madvise flag-MADV_DONTNEED) reads .


This is the link to the code: http://pastebin.com/8QywKGyS

There was a speed difference which when i switched between the two flags. I
have not used the O_DIRECT flag because Linus had criticized it.

Is this what the flags are intended to be used for ? This is just a sample
code with a test file .

On Wed, Apr 6, 2011 at 12:11 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:
> Hey Varun,
> On Tue, Apr 5, 2011 at 11:07 PM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>> Hi Varun,
>>
>> Those two issues would make a great GSoC!  Comments below...
> +1
>>
>> On Tue, Apr 5, 2011 at 1:56 PM, Varun Thacker
>> <varunthacker1...@gmail.com> wrote:
>>
>>> I would like to combine two tasks as part of my project
>>> namely-Directory createOutput and openInput should take an IOContext
>>> (Lucene-2793) and compliment it by Generalize DirectIOLinuxDir to
>>> UnixDir (Lucene-2795).
>>>
>>> The first part of the project is aimed at significantly reducing time
>>> taken to search during indexing by adding an IOContext which would
>>> store buffer size and have options to bypass the OS’s buffer cache
>>> (This is what causes the slowdown in search ) and other hints. Once
>>> completed I would move on to Lucene-2795 and generalize the Directory
>>> implementation to make a UnixDirectory .
>>
>> So, the first part (LUCENE-2793) should cause no change at all to
>> performance, functionality, etc., because it's "merely" installing the
>> plumbing (IOContext threaded throughout the low-level store APIs in
>> Lucene) so that higher levels can send important details down to the
>> Directory.  We'd fix IndexWriter/IndexReader to fill out this
>> IOContext with the details (merging, flushing, new reader, etc.).
>>
>> There's some fun/freedom here in figuring out just what details should
>> be included in IOContext... (eg: is it low level "set buffer size to 4
KB"
>> or is it high level "I am opening a new near-real-time reader").
>>
>> This first step is a rote cutover, just changing APIs but in no way
>> taking advantage of the new APIs.
>>
>> The 2nd step (LUCENE-2795) would then take advantage of this plumbing,
>> by creating a UnixDir impl that, using JNI (C code), passes advanced
>> flags when opening files, based on the incoming IOContext.
>>
>> The goal is a single UnixDir that has ifdefs so that it's usable
>> across multiple Unices, and eg would use direct IO if the context is
>> merging.  If we are ambitious we could rope Windows into the mix, too,
>> and then this would be NativeDir...
>>
>> We can measure success by validating that a big merge while searching
>> does not hurt search performance?  (Ie we should be able to reproduce
>> the results from
>> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html).
>
> Thanks for the summary mike!
>>
>>> I have spoken to Micheal McCandless and Simon Willnauer about
>>> undertaking these tasks. Micheal McCandless has agreed to mentor me .
>>> I would love to be able to contribute and learn from Apache Lucene
>>> community this summer. Also I would love suggestions on how to make my
>>> application proposal stronger.
>>
>> I think either Simon or I can be the "official" mentor, and then the
>> other one of us (and other Lucene committers) will support/chime
>> in...
>
> I will take the official responsibility here once we are there!
> simon
>>
>> This is an important change for Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>



-- 


Regards,
Varun Thacker
http://varunthacker.wordpress.com

Re: My GSOC proposal

Reply via email to