Re: thread safe shared IndexSearcher

Mark Miller Tue, 25 Sep 2007 12:56:47 -0700

Agreed. Perhaps I will abandon the static init. I really only put it asan option due to your synchronized cost concerns (a preload allows nonsynched read only access to the indexaccessor cache). Due keep in mindthat you don't have to use it though...if you dont preload, accessorsare created on demand but require you to go through a synch block.

I have some ideas and I will be making an attempt to smooth this all outtonight. Thanks for your input.


- Mark

Jay Yu wrote:

I agree with you on the compromise aspect of the design.
In particular, I think it's hard to preload all the index accessors inthe static init while allowing users specify the analyzer for each dirwithout requiring complicated config file ans using reflection.So a good compromise might be abandon preload the accessors. Afterall, the accessors are cached and not created often.
Thanks!

Jay


Mark Miller wrote:
I think its just a compromise in the design, though it could beimproved. You only ever want a single Writer at a time on the index.Those two flags are really just hints for when a Writer is firstopened...should it auto-commit and should it overwrite/create...if athread tries to writer concurrently with another thread, they willbriefly share a Writer, but generally a new Writer is created fairlyoften.
The general strategy should be to pick constant values and alwayspass them. There is an opening for the issue that you have a Writerand are adding a doc, and then before releasing that Writer, anotherWriter from another thread tries to clear the index with acreate=true, and it won't work. That's not a big concern though.
So the problem really is that these params control what happens whena new writer is created, but your not guaranteed to be creating aWriter, it may be cached. You really should pass the same autocommitflag , though its not necessary. I am open to suggestions for a morecoherent design, but functionally, it does work. I am also thinkingabout how to handle the Analyzer, and I think the solution (the needto init some indexaccessor params) might involve all these issues.
- Mark

Jay Yu wrote:
Mark,
Looking at your implementation of the DefaultIndexAccessor regardingthe writer, I think there could be a problem: you have only onecached writer but the getWriter(boolean, boolean) allows 2 booleans,so ideally, you need 4 cached writer. Otherwise if one starts with awriter that over writes the existing index, then later he cannotappend docs to the index.Do I miss sth here or you have not finished the implementation ofgetWriter yet?
Thanks!

Jay

Mark Miller wrote:
Ah, thanks for catching that. One of the pieces I did notfinish...the keyword analyzer was placeholder code.
I will take your comments into account and update the code.
I have some other pieces to polish as well. Previously, I extendedand built upon the original code, but I can't give it away, so thisis my attempt at something lessor, but cleaner.
Jay Yu wrote:
Thanks for the tip.
One small improvement on the IndexAccessorFactory might be toallow user to specify the Analyzer instead of using a defaultKeywordAnalyzer, which of course will make your static init of thecached accessors difficult unless you add more interfaces to theaccessor to allow reset analyzer/Dir as in my own version.
Jay

Mark Miller wrote:
One final note....if you are using the IndexAccessor and you areonly accessing the index from one JVM, you can use theNoLockFactory and save some sync cost there.
Jay Yu wrote:
Mark,
Great effort getting the original lucene index accessor packagein this shape. I am sure this will benefit a lot of people usingLucene in a multithread env.
I have a quick question to ask you:
Do you have to use the core Lucene 2.3-dev in order to use theaccessor?
I will take a look at your codes to see if I could help. I useda slightly modified version of the original package in myproject but it breaks some of my tests. I hope your versionworks better.
Thanks a lot!

Jay


Mark Miller wrote:
I have sat down and rewrote IndexAccessor from scratch. Icopied in the same reference counting logic, pruned somethings, and tried to make the whole package a bit simpler touse. I have a few things to do, but its pretty solid already.The only major thing I'd still like to do is add an option towarm searchers before putting them in the Searcher cache. Idlike to writer some more tests as well. Any help greatlyappreciated if your interested in using the thing.
http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java
Here is a an example of a class that can be instantiated in oneof multiple threads and read /modify a single index withoutworrying about what anyof the other threads are doing to the index at any given time.This is a very simple example of how to use the IndexAccessorand not necessarily anexample of best practices. The main idea is that you get yourWriter, Searcher, or Reader, and then be sure to release it assoon as your done with itin a finally block. For loading, you will want to load manydocs with a Writer (batch them) before releasing it, butremember that Readers will not get a new viewof the index until you release all of the Writers. So bewarehogging a Writer unless you thats what your intending.
JavaDoc:
http://myhardshadow.com/indexaccessorapi/

Code:
http://myhardshadow.com/indexaccessor/trunk/

Jar:
http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar


Your synchronized block concerns:
The synchronized blocks that control accesss to theIndexAccessor do not have a huge impact on performance. Keep inmind that all of the work is not done in a synchonrized block,just the retrieval of the Searcher, Writer, Reader. Even if thesynchronization makes the method twice as expensive, it isstill overpowered by the cost of parsing queries and searchingthe index. This applies with or without contention. I wrote asimple test and included the output below. You might use theIBM Lock Analyzer for Java to further analyze these costs.Trust me, this thing is speedy. Its many times better thanusing IndexModifier.
Without Contention
Just retrieve and release Searcher 100000 times
----
avg time:6.3E-4 ms
total time:63 ms

Parse query and search on 1 doc 100000 times
----
avg time:0.03107 ms
total time:3107 ms


With Contention (40 other threads running 80000 searches)
Just retrieve and release Searcher 100000 times
----
avg time:0.04643 ms
total time:4643 ms

Parse query and search on 1 doc 100000 times
----
avg time:0.64337 ms
total time:64337 ms


- Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: thread safe shared IndexSearcher

Reply via email to