Re: thread safe shared IndexSearcher

Jay Yu Mon, 24 Sep 2007 09:47:29 -0700

I'd be very interested to see your test results and codes. Thanks!


Mark Miller wrote:

I sat down over the weekend and rewrote the code from scratch so that Icould improve and simplify it somewhat. I also did some testing of thesynch costs, and it is very insignificant compared to the total time toparse a query and run a search. I'll try and get around to posting thecode tonight.
- Mark

Jay Yu wrote:
Mark Miller wrote:
Good luck Jay. Keep in mind, pretty much all LuceneIndexAccessor doesis sync Readers with Writers and allow multiple threads to share thesame instances of them -- nothing more. The code just forces Readersto refresh when Writers are used to change the index. There reallyisn't any functionality beyond that offered. Since you want to have amulti thread system access the same resources (which occasionallyneed to be refreshed) its not too easy to get around a synchronizedblock.
If I am able to extract some usable code for you soon I will let youknow.
I will appreciate it!
Thanks for your help!
- Mark

Jay Yu wrote:
Mark,

Thanks for sharing your valuable exp. and thoughts.
Frankly our system already has most of the functionalitiesLuceneIndexAcessor offers. The only thing I am looking for is tosync the searchers' close. That's why I am little worried about theway accessor handles the searcher sync.
I will probably give it a try to see how it performs in our system.

Thanks!

Jay

Mark Miller wrote:
The method is synched, but this is because each thread *does* sharethe same Searcher. To maintain a cache of searchers across multiplethreads, you've got to sync -- to reference count, you've got tosync. The performance hit of LuceneIndexAcessor is pretty minimalfor its functionality, and frankly, for the functionality you want,you have to pay a cost. Thats not even the end of it really...yourgoing to need to maintain a cache of Accessor objects for eachindex as well...and if you dont know all the indexes at startuptime, access to this will also need to be synched. I wouldn't worrythough -- searches are still lightening fast...that won't be thebottleneck. I'll work on getting you some code, but if yourworried, try some benchmarking on the original code.
Also, to be clear, I don't have the code in front of me, butgetting a Searcher does not require waiting for a Writer to bereleased. Searchers are cached and resused (and instantlyavailable) until a Writer is released. When this happens, therelease Writer method waits for all the Searchers to return(happens pretty quick as searches are pretty quick), the Searchercache is cleared, and then subsequent calls to getSearcher createnew Searchers that can see what the Writer added.
The key is use your Writer/Searcher/Reader quickly and then releaseit (unless your bulk loading). I've had such a system with 5+million docs on a standard machine and searches where still wellbelow a second after the first Searcher is cached (and even thefirst search is darn quick). And that includes a lot of extra crapI am doing.
- Mark

Jay Yu wrote:
Mark,
After reading the implementation ofLuceneIndexAccessor.getSearcher(),I realized that the method is synchronized and wait forwritingDirector to be released. That means if we getSearcher foreach query in each thread, there might be a contention andperformance hit. In fact, even the method of release(searcher) iscostly. On the other hand, if multiple threads share share onesearcher then it'd defeat the
purpose of using LuceneIndexAccessor.
Do I miss sth here? What's your suggested use case forLuceneIndexAccessor?
Thanks!

Jay
Mark Miller wrote:
Ill respond a point at a time:

1.

****************************** Hi Maik,

So what happens in this case:
IndexAccessProvider accessProvider = newIndexAccessProvider(directory,
analyzer);
LuceneIndexAccessor accessor = newLuceneIndexAccessor(accessProvider);
accessor.open();

IndexWriter writer = accessor.getWriter();

// reference to the same instance?

IndexWriter writer2 = accessor.getWriter();

writer.addDocument(....);

writer2.addDocument(....);



// I didn't release the writer yet

// will this block?

IndexReader reader = accessor.getReader();

reader.delete(....);

************
This is not really an issue. First, if you are going to deletewith a Readeryou need to call getWritingReader and not getReader. When you dothat, thegetWritingReader call will block until writer and writer2 arereleased. Ifyou are just adding a couple docs before releasing the writers,this is noproblem because the block will be very short. If you are loadingtons ofdocs and you want to be able to delete with a Reader in a timelymanner, youshould release the writers every now and then (release and re-getthe Writerevery 100 docs or something). An interactive index should not hogtheWriter, while something that is just loading a lot could hog theWriter.This is no different than normal…you cannot delete with a Readerwhileadding with a Writer with Lucene. This code just enforces thosesemantics.
The best solution is to just use a Writer to delete – I never get a
ReadingWriter.

2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
This is no big deal either. I just added another getWriter callthat takes a
create Boolean.
3. I don't think there is a latest release. This has never gottenmuchofficial attention and is not in the sandbox. I worked straightfrom the
originally submitted code.

4. I will look into getting together some code that I can share. The
multisearcher changes that are need are a couple of one linersreally, so at
a minimum I will give you the changes needed.



-       Mark



On 9/19/07, Jay Yu <[EMAIL PROTECTED]> wrote:

Mark,
thanks for sharing your insight and experience aboutLuceneIndexAccessor!
I remember seeing some people reporting some issues about it,such as:
http://www.archivum.info/[EMAIL PROTECTED]/2005-05/msg00114.html
http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3



Have those issues been resolved?
Where did you get the latest release? It is not in the officialLucene
sandbox/contrib.
Finally, are you willing to share your extended version toinclude your
tweak relating to the MultiSearcher?



Thanks a lot!



Jay



Mark Miller wrote:
I use option 3 extensivley and find it very effective. There isa tweak or
two required to get it to work right with MultiSearchers, butother than
that, the code is great. I have built a lot on top of it. I'm onthe list
all the time and would be happy to answer any questions you have in
regards
to LuceneIndexAccessor. Frankly, I think its overlooked far toomuch.
- Mark
On 9/19/07, Jay Yu <[EMAIL PROTECTED]> wrote:
In a multithread app like web app, a shared IndexSearcher couldthrow a
AlreadyClosedException when another thread is trying to update the
underlying IndexReader by closing the shared searcher after theindex is
updated. Searching over the past discussions on this mailinglist, I
found several approaches to solve the problem.
1. use solr
2. use DelayCloseIndexSearcher
3. use LuceneIndexAccessor
the first one is not feasible for us; some people seemed to have
problems with No. 2 and I do not find a lot of discussionsaround No.3.
I wonder if anyone has good experience on No 2 and 3?
Or do I miss other better solutions?
Thanks for any suggestion/comment!
Jay
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]

For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: thread safe shared IndexSearcher

Reply via email to