Re: IndexSearcher and IndexWriter.rollback
Q1. Are you using a SearcherManager or a direct IndexSearcher? If you are using a SearcherManager, you could just call `maybeRefresh()` and then re-acquire a new `IndexSearcher`. The method docs <https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/search/ReferenceManager.html#maybeRefresh()> also mention that it is fine to call `maybeRefresh` on multiple threads concurrently. Only the first thread will attempt the refresh; subsequent threads will see that another thread is already handling refresh and will return immediately. Q2. I don't think the IW exposes an interface to rollback to a commit without closing the writer. Hope this helps. Gautam Worah. On Thu, Apr 14, 2022 at 6:35 AM wrote: > I’m using an IndexSearcher created from an IndexWriter (NRT mode). > > Up until now the IW was kept open forever. > > I want to properly handle cases where an indexing task failed and call > IW.rollback to discard the incomplete changes. > > The problem I’m facing is that rollback also closes the writer. > > Q1: Can I somehow keep using the same IndexSearcher instance after the > writer is closed? > > Q2: Can I rollback the changes without closing the writer? > > > > Creating a new IndexSearcher is possible but can be a bit fragile as it is > used by many threads concurrently. > > > > > >
IndexSearcher and IndexWriter.rollback
I’m using an IndexSearcher created from an IndexWriter (NRT mode). Up until now the IW was kept open forever. I want to properly handle cases where an indexing task failed and call IW.rollback to discard the incomplete changes. The problem I’m facing is that rollback also closes the writer. Q1: Can I somehow keep using the same IndexSearcher instance after the writer is closed? Q2: Can I rollback the changes without closing the writer? Creating a new IndexSearcher is possible but can be a bit fragile as it is used by many threads concurrently.
Re: Can an indexreader/indexsearcher survive index edits?
Hi, If you continue to use your code without any changes your searcher should still work but it won't return newly indexed documents or reflect deletes. You can consider using a SearcherManager in your searching process and periodically (use a thread maybe?) ask it to `maybeRefresh()`. Then the next time you call `acquire()` on this searcherManager, you will get an updated Searcher that can reflect the new incremental changes the other thread has made on the index. Useful references: Search using a SearcherManager (has a code example similar to your situation) <https://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html> Near real time search with a SearcherManager (faster than the above approach) <https://blog.mikemccandless.com/2011/11/near-real-time-readers-with-lucenes.html> Similar stackoverflow question <https://stackoverflow.com/questions/45275557/lucene-near-real-time-search> - Gautam Worah. On Wed, Sep 22, 2021 at 1:10 PM Trevor Nicholls wrote: > Hi > > > > Lucene 8.6.3 > > > > In a prototype application I build a Lucene index with a single process and > query it with another. Every operation is a new process. When the data > changes I simply recreate the index and future searches pick up the new > index. Of course performance is sub-optimal. > > > > So I am changing this so that after the initial build subsequent data > changes will update the index rather than rebuilding the entire index. > > I am also changing the search method so that I have a single service which > creates an IndexReader and IndexSearcher at startup, and reads and responds > to search requests through a socket. > > > > I know that an existing index can be maintained with selective deletions > and > additions, but I am not sure if the process holding the reader and searcher > objects can continue running without having to close and recreate them when > the index is modified. > > Is it safe to do that? > > > > cheers > > T > > > >
Can an indexreader/indexsearcher survive index edits?
Hi Lucene 8.6.3 In a prototype application I build a Lucene index with a single process and query it with another. Every operation is a new process. When the data changes I simply recreate the index and future searches pick up the new index. Of course performance is sub-optimal. So I am changing this so that after the initial build subsequent data changes will update the index rather than rebuilding the entire index. I am also changing the search method so that I have a single service which creates an IndexReader and IndexSearcher at startup, and reads and responds to search requests through a socket. I know that an existing index can be maintained with selective deletions and additions, but I am not sure if the process holding the reader and searcher objects can continue running without having to close and recreate them when the index is modified. Is it safe to do that? cheers T
Re: Query on searchAfter API usage in IndexSearcher
Are you specifying a sort clause on your query? I'm not totally sure, but I think having a sort clause might be a requirement for efficient deep paging. I know Solr's cursorMark feature uses the searchAfter API, and a cursorMark is essentially the sort values of the last document from the previous result: https://github.com/apache/lucene-solr/blob/e30264b31400a147507aabd121b1152020b8aa6d/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L1524-L1525 https://lucene.apache.org/solr/guide/7_3/pagination-of-results.html On Wed, May 9, 2018 at 4:56 AM, Jacky Liwrote: > I have encountered the same problem, I wonder if anyone know the solution? > > Regards, > Jacky > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Query on searchAfter API usage in IndexSearcher
I have encountered the same problem, I wonder if anyone know the solution? Regards, Jacky -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Query on searchAfter API usage in IndexSearcher
Hi Lucene Team, Can you please reply to my query. Its a urgent issue and we need to resolve it at the earliest. Lucene Version used is 6.3.0 but even tried with the latest version 7.3.0. Regards Manish Gupta -- Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Query on searchAfter API usage in IndexSearcher
Hi Team, I am new to Lucene and I am trying to use Lucene for text search in my project to achieve better results in terms of query performance. Initially I was facing lot of GC issues while using lucene as I was using search API and passing all the documents count. As my data size is around 4 billion the number of documents created by Lucene were huge. Internally search API uses TopScoreDocCollector which internally creates a PriorityQueue of given documents count thus causing lot of GC. *To avoid this problem I am trying to query using a pagination way wherein I am query only 10 documents at a time and after that I am using seacrhAfter API to query further passing the lastScoreDoc from previous result. This has resolved the GC problem but the query time has increased by a huge margin from 3 sec to 600 sec.* *When I debugged I found that even though I use the searchAfter API, it is not avoiding the IO and every time it is reading the data from disk again. It is only skipping the results filled in previous search. Is my understanding correct?. If yes please let me know if there is a better way to query the results in incremental order so as to avoid GC and with minimal impact on query performance.* Regards Manish Gupta
Re: Implement an IndexSearcher which never returns any documents
thx, great solution 2017-09-11 17:55 GMT+02:00 Adrien Grand <jpou...@gmail.com>: > You could create a `new IndexSearcher(new MultiReader());` > > Le sam. 9 sept. 2017 à 19:40, Mitchell Stevenson < > mitchell.stevenson...@gmail.com> a écrit : > >> I need to implement an IndexSearcher for Lucene 7 which never returns >> any documents. >> Is the following implementation suitable for this? The code seems to >> work nicely but i am not sure about it. >> >> IndexSearcher noDocsSearcher = new IndexSearcher(new NoDocsReader()); >> >> public class NoDocsReader extends LeafReader { >> >> private final static Bits liveDocs = new Bits.MatchNoBits(0); >> >> public NoDocsReader() { >> tryIncRef(); //keep reader open >> } >> >> @Override >> public NumericDocValues getNumericDocValues(final String field) >> throws IOException { >> return new NumericDocValues() { >> >> @Override >> public long longValue() throws IOException { >> return 0; >> } >> >> @Override >> public boolean advanceExact(int target) throws IOException { >> return false; >> } >> >> @Override >> public int docID() { >> return 0; >> } >> >> @Override >> public int nextDoc() throws IOException { >> return 0; >> } >> >> @Override >> public int advance(int target) throws IOException { >> return 0; >> } >> >> @Override >> public long cost() { >> return 0; >> } >> }; >> } >> >> @Override >> public BinaryDocValues getBinaryDocValues(final String field) >> throws IOException { >> return null; >> } >> >> @Override >> public SortedDocValues getSortedDocValues(final String field) >> throws IOException { >> return null; >> } >> >> @Override >> public SortedNumericDocValues getSortedNumericDocValues(final >> String field) throws IOException { >> return null; >> } >> >> @Override >> public SortedSetDocValues getSortedSetDocValues(final String >> field) throws IOException { >> return null; >> } >> >> @Override >> public NumericDocValues getNormValues(final String field) throws >> IOException { >> return null; >> } >> >> @Override >> public FieldInfos getFieldInfos() { >> return new FieldInfos(new FieldInfo[0]); >> } >> >> @Override >> public Bits getLiveDocs() { >> return liveDocs; >> } >> >> @Override >> public void checkIntegrity() throws IOException { >> } >> >> @Override >> public Fields getTermVectors(final int docID) throws IOException { >> return null; >> } >> >> @Override >> public int numDocs() { >> return 0; >> } >> >> @Override >> public int maxDoc() { >> return 0; >> } >> >> @Override >> public void document(final int docID, final StoredFieldVisitor >> visitor) throws IOException { >> } >> >> @Override >> protected void doClose() throws IOException { >> } >> >> @Override >> public boolean hasDeletions() { >> return false; >> } >> >> @Override >> public CacheHelper getCoreCacheHelper() { >> return null; >> } >> >> @Override >> public Terms terms(String field) throws IOException { >> return null; >> } >> >> @Override >> public PointValues getPointValues(String field) throws IOException { >> return null; >> } >> >> @Override >> public LeafMetaData getMetaData() { >> return null; >> } >> >> @Override >> public CacheHelper getReaderCacheHelper() { >> return null; >> } >> } >> >> Thanks >> Mitch >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Implement an IndexSearcher which never returns any documents
You could create a `new IndexSearcher(new MultiReader());` Le sam. 9 sept. 2017 à 19:40, Mitchell Stevenson < mitchell.stevenson...@gmail.com> a écrit : > I need to implement an IndexSearcher for Lucene 7 which never returns > any documents. > Is the following implementation suitable for this? The code seems to > work nicely but i am not sure about it. > > IndexSearcher noDocsSearcher = new IndexSearcher(new NoDocsReader()); > > public class NoDocsReader extends LeafReader { > > private final static Bits liveDocs = new Bits.MatchNoBits(0); > > public NoDocsReader() { > tryIncRef(); //keep reader open > } > > @Override > public NumericDocValues getNumericDocValues(final String field) > throws IOException { > return new NumericDocValues() { > > @Override > public long longValue() throws IOException { > return 0; > } > > @Override > public boolean advanceExact(int target) throws IOException { > return false; > } > > @Override > public int docID() { > return 0; > } > > @Override > public int nextDoc() throws IOException { > return 0; > } > > @Override > public int advance(int target) throws IOException { > return 0; > } > > @Override > public long cost() { > return 0; > } > }; > } > > @Override > public BinaryDocValues getBinaryDocValues(final String field) > throws IOException { > return null; > } > > @Override > public SortedDocValues getSortedDocValues(final String field) > throws IOException { > return null; > } > > @Override > public SortedNumericDocValues getSortedNumericDocValues(final > String field) throws IOException { > return null; > } > > @Override > public SortedSetDocValues getSortedSetDocValues(final String > field) throws IOException { > return null; > } > > @Override > public NumericDocValues getNormValues(final String field) throws > IOException { > return null; > } > > @Override > public FieldInfos getFieldInfos() { > return new FieldInfos(new FieldInfo[0]); > } > > @Override > public Bits getLiveDocs() { > return liveDocs; > } > > @Override > public void checkIntegrity() throws IOException { > } > > @Override > public Fields getTermVectors(final int docID) throws IOException { > return null; > } > > @Override > public int numDocs() { > return 0; > } > > @Override > public int maxDoc() { > return 0; > } > > @Override > public void document(final int docID, final StoredFieldVisitor > visitor) throws IOException { > } > > @Override > protected void doClose() throws IOException { > } > > @Override > public boolean hasDeletions() { > return false; > } > > @Override > public CacheHelper getCoreCacheHelper() { > return null; > } > > @Override > public Terms terms(String field) throws IOException { > return null; > } > > @Override > public PointValues getPointValues(String field) throws IOException { > return null; > } > > @Override > public LeafMetaData getMetaData() { > return null; > } > > @Override > public CacheHelper getReaderCacheHelper() { > return null; > } > } > > Thanks > Mitch > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Implement an IndexSearcher which never returns any documents
I need to implement an IndexSearcher for Lucene 7 which never returns any documents. Is the following implementation suitable for this? The code seems to work nicely but i am not sure about it. IndexSearcher noDocsSearcher = new IndexSearcher(new NoDocsReader()); public class NoDocsReader extends LeafReader { private final static Bits liveDocs = new Bits.MatchNoBits(0); public NoDocsReader() { tryIncRef(); //keep reader open } @Override public NumericDocValues getNumericDocValues(final String field) throws IOException { return new NumericDocValues() { @Override public long longValue() throws IOException { return 0; } @Override public boolean advanceExact(int target) throws IOException { return false; } @Override public int docID() { return 0; } @Override public int nextDoc() throws IOException { return 0; } @Override public int advance(int target) throws IOException { return 0; } @Override public long cost() { return 0; } }; } @Override public BinaryDocValues getBinaryDocValues(final String field) throws IOException { return null; } @Override public SortedDocValues getSortedDocValues(final String field) throws IOException { return null; } @Override public SortedNumericDocValues getSortedNumericDocValues(final String field) throws IOException { return null; } @Override public SortedSetDocValues getSortedSetDocValues(final String field) throws IOException { return null; } @Override public NumericDocValues getNormValues(final String field) throws IOException { return null; } @Override public FieldInfos getFieldInfos() { return new FieldInfos(new FieldInfo[0]); } @Override public Bits getLiveDocs() { return liveDocs; } @Override public void checkIntegrity() throws IOException { } @Override public Fields getTermVectors(final int docID) throws IOException { return null; } @Override public int numDocs() { return 0; } @Override public int maxDoc() { return 0; } @Override public void document(final int docID, final StoredFieldVisitor visitor) throws IOException { } @Override protected void doClose() throws IOException { } @Override public boolean hasDeletions() { return false; } @Override public CacheHelper getCoreCacheHelper() { return null; } @Override public Terms terms(String field) throws IOException { return null; } @Override public PointValues getPointValues(String field) throws IOException { return null; } @Override public LeafMetaData getMetaData() { return null; } @Override public CacheHelper getReaderCacheHelper() { return null; } } Thanks Mitch - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Lucene IndexSearcher PrefixQuery seach getting really slow after a while
Try to optimize your indexes. Sent securely from my iPhone From: Jason Wu Sent: Thursday, 3 November 2016 at 22:21:55 To: java-user@lucene.apache.org Subject: Lucene IndexSearcher PrefixQuery seach getting really slow after a while Hi Team, We are using lucene 4.8.1 to do some info searches every day for years. However, recently we encounter some performance issues which greatly slow down the lucene search. After application running for a while, we are facing below issues, which IndexSearcher PrefixQuery taking much longer time to search: [cid:image002.png@01D235EC.3C063740] Our cpu and memory are fine, no leak found: [cid:image004.jpg@01D235EC.3C063740] However, for the exactly same java instance we are running on another box, for the same info we are searching, it is very fast. I/O, memory, CPUS are all fine on both boxes. So, do you know any reasons can cause this performance issue? Thank you, J.W This e-mail, including accompanying communications and attachments, is strictly confidential and only for the intended recipient. Any retention, use or disclosure not expressly authorised by Markit is prohibited. This email is subject to all waivers and other terms at the following link: http://www.markit.com/en/about/legal/email-disclaimer.page Please visit http://www.markit.com/en/about/contact/contact-us.page for contact information on our offices worldwide. Email secured by Check Point --ef.-1.1203814c8b95672f9c638a79d60fbb34.ef--
Lucene IndexSearcher PrefixQuery seach getting really slow after a while
Hi Team, We are using lucene 4.8.1 to do some info searches every day for years. However, recently we encounter some performance issues which greatly slow down the lucene search. After application running for a while, we are facing below issues, which IndexSearcher PrefixQuery taking much longer time to search: [cid:image002.png@01D235EC.3C063740] Our cpu and memory are fine, no leak found: [cid:image004.jpg@01D235EC.3C063740] However, for the exactly same java instance we are running on another box, for the same info we are searching, it is very fast. I/O, memory, CPUS are all fine on both boxes. So, do you know any reasons can cause this performance issue? Thank you, J.W This e-mail, including accompanying communications and attachments, is strictly confidential and only for the intended recipient. Any retention, use or disclosure not expressly authorised by Markit is prohibited. This email is subject to all waivers and other terms at the following link: http://www.markit.com/en/about/legal/email-disclaimer.page Please visit http://www.markit.com/en/about/contact/contact-us.page for contact information on our offices worldwide.
Re: Sorting IndexSearcher results by LongPoint with 6.0
Hi Jeremy, Yes. That's right. The question is if you really need the stored field, but that's out of scope for this issue. Uwe Am 27. Mai 2016 01:21:48 MESZ, schrieb Jeremy Friesen: >Thanks for the help. So just to sum up, if I have a numeric field type >that >I want to be able to do a range query on, sort by, and also retrieve in >the >document as a stored value, I will need to add it to the document three >times, as a NumericDocValuesField, as a LongPoint, and as a >StoredField. >Does that sound correct? > >On Thu, May 26, 2016 at 3:43 PM, Uwe Schindler wrote: > >> Hi, >> >> Sorting does not work on indexed fields anymore (since Lucene 5), >unless >> you use UninvertingReader. Point values don't work with that because >they >> cannot be uninverted. >> >> For sorting it's the same rule for all field types: enable DocValues! >You >> just have to add another field instance with same name using doc >values >> (some numeric type). >> >> Uwe >> >> Am 26. Mai 2016 23:53:56 MESZ, schrieb Jeremy Friesen > : >> >I'm attempting to upgrade my project to Lucene 6.0, and have run >into >> >an >> >issue with sorting my results. My documents have a timestamp field >that >> >was >> >previously a StoredField with NumericType: Long. I've converted it >to a >> >LongPoint, which seems to work fine for range queries. >> > >> >My problem is that trying to sort search results with a SortField of >> >type >> >Long now doesn't seem to work with a LongPoint field. I get an >> >IllegalStateException "unexpected docvalues type NONE for field >> >'timestamp' >> >(expected=NUMERIC). Use UninvertingReader or index with docvalues." >> > >> >I'm guessing the sorter hasn't been updated to work with PointValues >> >yet, >> >but I just wanted to check with the mailing list to see if anyone >else >> >has >> >found a way to do results sorting under 6.0. >> >> -- >> Uwe Schindler >> H.-H.-Meier-Allee 63, 28213 Bremen >> http://www.thetaphi.de >> -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Sorting IndexSearcher results by LongPoint with 6.0
Thanks for the help. So just to sum up, if I have a numeric field type that I want to be able to do a range query on, sort by, and also retrieve in the document as a stored value, I will need to add it to the document three times, as a NumericDocValuesField, as a LongPoint, and as a StoredField. Does that sound correct? On Thu, May 26, 2016 at 3:43 PM, Uwe Schindlerwrote: > Hi, > > Sorting does not work on indexed fields anymore (since Lucene 5), unless > you use UninvertingReader. Point values don't work with that because they > cannot be uninverted. > > For sorting it's the same rule for all field types: enable DocValues! You > just have to add another field instance with same name using doc values > (some numeric type). > > Uwe > > Am 26. Mai 2016 23:53:56 MESZ, schrieb Jeremy Friesen : > >I'm attempting to upgrade my project to Lucene 6.0, and have run into > >an > >issue with sorting my results. My documents have a timestamp field that > >was > >previously a StoredField with NumericType: Long. I've converted it to a > >LongPoint, which seems to work fine for range queries. > > > >My problem is that trying to sort search results with a SortField of > >type > >Long now doesn't seem to work with a LongPoint field. I get an > >IllegalStateException "unexpected docvalues type NONE for field > >'timestamp' > >(expected=NUMERIC). Use UninvertingReader or index with docvalues." > > > >I'm guessing the sorter hasn't been updated to work with PointValues > >yet, > >but I just wanted to check with the mailing list to see if anyone else > >has > >found a way to do results sorting under 6.0. > > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de >
Re: Sorting IndexSearcher results by LongPoint with 6.0
Hi, Sorting does not work on indexed fields anymore (since Lucene 5), unless you use UninvertingReader. Point values don't work with that because they cannot be uninverted. For sorting it's the same rule for all field types: enable DocValues! You just have to add another field instance with same name using doc values (some numeric type). Uwe Am 26. Mai 2016 23:53:56 MESZ, schrieb Jeremy Friesen: >I'm attempting to upgrade my project to Lucene 6.0, and have run into >an >issue with sorting my results. My documents have a timestamp field that >was >previously a StoredField with NumericType: Long. I've converted it to a >LongPoint, which seems to work fine for range queries. > >My problem is that trying to sort search results with a SortField of >type >Long now doesn't seem to work with a LongPoint field. I get an >IllegalStateException "unexpected docvalues type NONE for field >'timestamp' >(expected=NUMERIC). Use UninvertingReader or index with docvalues." > >I'm guessing the sorter hasn't been updated to work with PointValues >yet, >but I just wanted to check with the mailing list to see if anyone else >has >found a way to do results sorting under 6.0. -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Sorting IndexSearcher results by LongPoint with 6.0
I'm attempting to upgrade my project to Lucene 6.0, and have run into an issue with sorting my results. My documents have a timestamp field that was previously a StoredField with NumericType: Long. I've converted it to a LongPoint, which seems to work fine for range queries. My problem is that trying to sort search results with a SortField of type Long now doesn't seem to work with a LongPoint field. I get an IllegalStateException "unexpected docvalues type NONE for field 'timestamp' (expected=NUMERIC). Use UninvertingReader or index with docvalues." I'm guessing the sorter hasn't been updated to work with PointValues yet, but I just wanted to check with the mailing list to see if anyone else has found a way to do results sorting under 6.0.
Re: IndexReader returns all fields, but IndexSearcher does not
Hi - I suggest you narrow the problem down to a small self-contained example and if you still can't get it to work, show us the code. And tell us what version of Lucene you are using. -- Ian. On Mon, Jun 1, 2015 at 5:20 PM, Rahul Kotecha kotecha.rahul...@gmail.com wrote: Hi All, I am trying to query an index. When I try to read the index using IndexReader, I am able to print all the fields (close to 30 fields stored) in the index. However, when I run a query on the same index using IndexSearcher, I am able to get only a couple of fields instead of all the fields as returned by IndexReader. Any help would be greatly appreciated. Regards, Rahul Kotecha - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
IndexReader returns all fields, but IndexSearcher does not
Hi All, I am trying to query an index. When I try to read the index using IndexReader, I am able to print all the fields (close to 30 fields stored) in the index. However, when I run a query on the same index using IndexSearcher, I am able to get only a couple of fields instead of all the fields as returned by IndexReader. Any help would be greatly appreciated. Regards, Rahul Kotecha
IndexSearcher creation policy question
I've this scenario in a web application: 1. many users query a Lucene index concurrently (obvious) 2. one user can make several queries (she may have different browser windows open) 3. all those queries need to have a consistent paging behavior (next, previous buttons) 4. The index can be updated at any time by users. What I understand is that: - I need a fresh IndexSearcher for each initial query (DirectoryReader.open - reader - searcher) and cannot use Search(Lifetime)Manager's. - I cannot share IndexSearchers in the depicted scenario; even for the same user, a different IndexSearcher is needed for each window. Is my understanding true ? What would be the best approach to handle this scenario ? Kind regards, Rolf. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher creation policy question
Your best bet is to use a searcher manager to manage the searcher instance, and only refresh the manager if writes are committed. This way the same searcher instances can be shared by multiple threads. For the paging, if you want to have a guaranteed consistent view, you have to keep around the searcher instance provided by the manager, and only release it once all the search/paging is done. But do remember to release it afterwards, otherwise you will quickly accumulate lots of unclosed old searcher instances. On Friday, August 22, 2014, Rolf Veen rolf.v...@gmail.com wrote: I've this scenario in a web application: 1. many users query a Lucene index concurrently (obvious) 2. one user can make several queries (she may have different browser windows open) 3. all those queries need to have a consistent paging behavior (next, previous buttons) 4. The index can be updated at any time by users. What I understand is that: - I need a fresh IndexSearcher for each initial query (DirectoryReader.open - reader - searcher) and cannot use Search(Lifetime)Manager's. - I cannot share IndexSearchers in the depicted scenario; even for the same user, a different IndexSearcher is needed for each window. Is my understanding true ? What would be the best approach to handle this scenario ? Kind regards, Rolf. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org javascript:; For additional commands, e-mail: java-user-h...@lucene.apache.org javascript:;
Re: absence of searchAfter method with Collector parameter in Lucene IndexSearcher
Thank you Hoss. I was exactly looking for sth like TopFieldCollector.create(...). Basically my objective is to sort the document by the document number(I have read only index with only one segment also because of some other requirements). Here's what I did, // create a sort field based on document number SortField sortField = new SortField(null, Type.DOC); // create a sort instance based on the sortField Sort sort = new Sort(sortField); // create a fieldDoc instance from the 'ScoreDoc after' instance FieldDoc fieldDoc = new FieldDoc(after.doc, 0, new Object[] { after.doc }); // create a collector, this collector will be wrapped later on(but i am not showing that part here) TopFieldCollector collector = TopFieldCollector.create(sort, numHits, fieldDoc, true, false, false, true); // search the index indexSearcher.search(query, collector) Logically, everything should be working fine. But I get java.lang.ArrayIndexOutOfBoundsException: -1 all the time. The only part that looks problematic is the instance of FieldDoc. Since I have defined the sort to be based on Document Number in lucene, my fieldDoc must have the document number of the after ScoreDoc as per the documentation. But this is somehow not working. I would appreciate your suggestions. Best, -- Kailash Budhathoki On Sat, Jun 7, 2014 at 12:00 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I was wondering why there is no search method in lucene Indexsearcher to : search after last reference by passing collector. Say a method with : signature like searchAfter(Query query, ScoreDoc after, Collector results). searchAfter only makes sense if there is a Sort involved -- either explicitly or implicitly on score When you use a Collector, even if your collector produces ScoreDoc objects, a subsequent (hypothetical) call searchAfter(Query,ScoreDoc,Collector) would have no idea what the meaning of after was for that ScoreDoc. (Even if the ScoreDoc was an instance of FieldDoc that encapsulated the values for the sort fields, it doesn't know what the fieldNames are, or what the comparator/direction to use against those field+values are to know what is after them). So from an API standpoint: it just doesn't make any sense. if you want searchAfter functionality along with custom Collector logic, take a look at things like TopFieldCollector.create(...) which you could then wrap in your own Collector. -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
absence of searchAfter method with Collector parameter in Lucene IndexSearcher
Hi, I was wondering why there is no search method in lucene Indexsearcher to search after last reference by passing collector. Say a method with signature like searchAfter(Query query, ScoreDoc after, Collector results). For normal search there are two ways to search; one by passing the collector and one by passing the number of hits. But searchAfter method only supports number of hits. Is this done deliberately because of some architectural reasons? Thanking you. Best, -- Kailash Budhathoki
Re: absence of searchAfter method with Collector parameter in Lucene IndexSearcher
: I was wondering why there is no search method in lucene Indexsearcher to : search after last reference by passing collector. Say a method with : signature like searchAfter(Query query, ScoreDoc after, Collector results). searchAfter only makes sense if there is a Sort involved -- either explicitly or implicitly on score When you use a Collector, even if your collector produces ScoreDoc objects, a subsequent (hypothetical) call searchAfter(Query,ScoreDoc,Collector) would have no idea what the meaning of after was for that ScoreDoc. (Even if the ScoreDoc was an instance of FieldDoc that encapsulated the values for the sort fields, it doesn't know what the fieldNames are, or what the comparator/direction to use against those field+values are to know what is after them). So from an API standpoint: it just doesn't make any sense. if you want searchAfter functionality along with custom Collector logic, take a look at things like TopFieldCollector.create(...) which you could then wrap in your own Collector. -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to make good use of the multithreaded IndexSearcher?
Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote: The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple cores? forceMerge[1], formerly known as optimize, takes a parameter to configure how many segments should remain in the index. Regarding multi-core usage, if your query load is high enough to use all you CPUs (there are alwas #cores queries running in parrallel), there is generally no need to use the multi-threaded IndexSearcher. The multi-threaded index searcher can however help in case all CPU power is not in use or if you care more about latency than throughput. It indeed leverages the fact that the index is splitted into segments to parallelize query execution, so a fully merged index will actually run the query in a single thread in any case. There is no way to make query execution efficiently use several cores on a single-segment index so if you really want to parallelize query execution, you will have to shard the index to do at the index level what the multi-threaded IndexSearcher does at the segment level. Side notes: - A single segment index only runs more efficiently queries which are terms-dictionary-intensive, it is generally discouraged to run forceMerge on an index unless this index is read-only. - The multi-threaded index searcher only parallelizes query execution in certain cases. In particular, it never parallelizes execution when the method takes a collector. This means that if you want to use TotalHitCountCollector to count matches, you will have to do the parallelization by yourself. [1] http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29 -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to make good use of the multithreaded IndexSearcher?
You might want to set a smallish maxMergedSegmentMB in TieredMergePolicy to force enough segments in the index ... sort of the opposite of optimizing. Really, IndexSearcher's approach to using one thread per segment is rather silly, and, it's annoying/bad to expose change in behavior due to segment structure. I think it'd be better to carve up the overall docID space into N virtual shards. Ie, if you have 100M docs, then one thread searches docs 0-10M, another 10M-20M, etc. Nobody has created such a searcher impl but it should not be hard and it would be agnostic to the segment structure. But then again, this need (using concurrent hardware to reduce latency of a single query) is somewhat rare; most apps are fine using the concurrency across queries rather than within one query. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand jpou...@gmail.com wrote: Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote: The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple cores? forceMerge[1], formerly known as optimize, takes a parameter to configure how many segments should remain in the index. Regarding multi-core usage, if your query load is high enough to use all you CPUs (there are alwas #cores queries running in parrallel), there is generally no need to use the multi-threaded IndexSearcher. The multi-threaded index searcher can however help in case all CPU power is not in use or if you care more about latency than throughput. It indeed leverages the fact that the index is splitted into segments to parallelize query execution, so a fully merged index will actually run the query in a single thread in any case. There is no way to make query execution efficiently use several cores on a single-segment index so if you really want to parallelize query execution, you will have to shard the index to do at the index level what the multi-threaded IndexSearcher does at the segment level. Side notes: - A single segment index only runs more efficiently queries which are terms-dictionary-intensive, it is generally discouraged to run forceMerge on an index unless this index is read-only. - The multi-threaded index searcher only parallelizes query execution in certain cases. In particular, it never parallelizes execution when the method takes a collector. This means that if you want to use TotalHitCountCollector to count matches, you will have to do the parallelization by yourself. [1] http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29 -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to make good use of the multithreaded IndexSearcher?
Benson, Rather than forcing a random number of small segments into the index using maxMergedSegmentMB, it might be better to split your index into multiple shards. You can create a specific number of balanced shards to control the parallelism and then forceMerge each shard down to 1 segment to avoid spawning extra threads per shard. Once that's done, you just open all of the shards with a MultiReader and use that with the IndexSearcher and an ExecutorService. The downside to this is that it doesn't play nicely with near real-time search, but if you have a relatively static index that gets pushed to slaves periodically it gets the job done. As Mike said, it'd be nicer if there was a way to split the docID space into virtual shards, but it's not currently available. I'm not sure if anyone is even looking into it. Regards, Matt On Tue, Oct 1, 2013 at 7:09 AM, Michael McCandless luc...@mikemccandless.com wrote: You might want to set a smallish maxMergedSegmentMB in TieredMergePolicy to force enough segments in the index ... sort of the opposite of optimizing. Really, IndexSearcher's approach to using one thread per segment is rather silly, and, it's annoying/bad to expose change in behavior due to segment structure. I think it'd be better to carve up the overall docID space into N virtual shards. Ie, if you have 100M docs, then one thread searches docs 0-10M, another 10M-20M, etc. Nobody has created such a searcher impl but it should not be hard and it would be agnostic to the segment structure. But then again, this need (using concurrent hardware to reduce latency of a single query) is somewhat rare; most apps are fine using the concurrency across queries rather than within one query. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand jpou...@gmail.com wrote: Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote: The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple cores? forceMerge[1], formerly known as optimize, takes a parameter to configure how many segments should remain in the index. Regarding multi-core usage, if your query load is high enough to use all you CPUs (there are alwas #cores queries running in parrallel), there is generally no need to use the multi-threaded IndexSearcher. The multi-threaded index searcher can however help in case all CPU power is not in use or if you care more about latency than throughput. It indeed leverages the fact that the index is splitted into segments to parallelize query execution, so a fully merged index will actually run the query in a single thread in any case. There is no way to make query execution efficiently use several cores on a single-segment index so if you really want to parallelize query execution, you will have to shard the index to do at the index level what the multi-threaded IndexSearcher does at the segment level. Side notes: - A single segment index only runs more efficiently queries which are terms-dictionary-intensive, it is generally discouraged to run forceMerge on an index unless this index is read-only. - The multi-threaded index searcher only parallelizes query execution in certain cases. In particular, it never parallelizes execution when the method takes a collector. This means that if you want to use TotalHitCountCollector to count matches, you will have to do the parallelization by yourself. [1] http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29 -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to make good use of the multithreaded IndexSearcher?
On Tue, Oct 1, 2013 at 3:58 PM, Desidero desid...@gmail.com wrote: Benson, Rather than forcing a random number of small segments into the index using maxMergedSegmentMB, it might be better to split your index into multiple shards. You can create a specific number of balanced shards to control the parallelism and then forceMerge each shard down to 1 segment to avoid spawning extra threads per shard. Once that's done, you just open all of the shards with a MultiReader and use that with the IndexSearcher and an ExecutorService. The downside to this is that it doesn't play nicely with near real-time search, but if you have a relatively static index that gets pushed to slaves periodically it gets the job done. As Mike said, it'd be nicer if there was a way to split the docID space into virtual shards, but it's not currently available. I'm not sure if anyone is even looking into it. Thanks, folks, for all the help. I'm musing about the top-level issue here, which is whether the important case is many independent queries or latency of just one. In the case where it's just one, we'll follow the shard-related advice. Regards, Matt On Tue, Oct 1, 2013 at 7:09 AM, Michael McCandless luc...@mikemccandless.com wrote: You might want to set a smallish maxMergedSegmentMB in TieredMergePolicy to force enough segments in the index ... sort of the opposite of optimizing. Really, IndexSearcher's approach to using one thread per segment is rather silly, and, it's annoying/bad to expose change in behavior due to segment structure. I think it'd be better to carve up the overall docID space into N virtual shards. Ie, if you have 100M docs, then one thread searches docs 0-10M, another 10M-20M, etc. Nobody has created such a searcher impl but it should not be hard and it would be agnostic to the segment structure. But then again, this need (using concurrent hardware to reduce latency of a single query) is somewhat rare; most apps are fine using the concurrency across queries rather than within one query. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand jpou...@gmail.com wrote: Hi Benson, On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies ben...@basistech.com wrote: The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple cores? forceMerge[1], formerly known as optimize, takes a parameter to configure how many segments should remain in the index. Regarding multi-core usage, if your query load is high enough to use all you CPUs (there are alwas #cores queries running in parrallel), there is generally no need to use the multi-threaded IndexSearcher. The multi-threaded index searcher can however help in case all CPU power is not in use or if you care more about latency than throughput. It indeed leverages the fact that the index is splitted into segments to parallelize query execution, so a fully merged index will actually run the query in a single thread in any case. There is no way to make query execution efficiently use several cores on a single-segment index so if you really want to parallelize query execution, you will have to shard the index to do at the index level what the multi-threaded IndexSearcher does at the segment level. Side notes: - A single segment index only runs more efficiently queries which are terms-dictionary-intensive, it is generally discouraged to run forceMerge on an index unless this index is read-only. - The multi-threaded index searcher only parallelizes query execution in certain cases. In particular, it never parallelizes execution when the method takes a collector. This means that if you want to use TotalHitCountCollector to count matches, you will have to do the parallelization by yourself. [1] http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29 -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
How to make good use of the multithreaded IndexSearcher?
The multithreaded index searcher fans out across segments. How aggressively does 'optimize' reduce the number of segments? If the segment count goes way down, is there some other way to exploit multiple cores?
IndexSearcher using Collector
Hi, I have multiple index that i want to search against, thus i am using MultiReader for that. Along with this I also want all the matches to the query so i am using Collector class for this. The issue i am facing is that I am not able to know when all the matches are done, i.e. for each matching doc the collect function on the Collector class will be called but, when all the matches are done how can i come to know about that. The search function doesn't block. Is there any way to get this done? Thanks Amit
RE: IndexSearcher using Collector
Hi, The search function does block. IndexSearcher.search(Query, Collector) returns when all collecting is done. You can do the after-collect work after it returns. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: amit nanda [mailto:amit...@gmail.com] Sent: Wednesday, June 19, 2013 10:59 AM To: java-user@lucene.apache.org Subject: IndexSearcher using Collector Hi, I have multiple index that i want to search against, thus i am using MultiReader for that. Along with this I also want all the matches to the query so i am using Collector class for this. The issue i am facing is that I am not able to know when all the matches are done, i.e. for each matching doc the collect function on the Collector class will be called but, when all the matches are done how can i come to know about that. The search function doesn't block. Is there any way to get this done? Thanks Amit - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Necessary to close() IndexSearcher in 4.X?
Hi, In Lucene before 4.0 there was a close method in IndexSearcher, because you were able to create IndexSearcher using Directory, which internally opened an IndexReader. This IndexReader had to be closed, so there was a need for IndexSearcher.close(). In 3.x this was constructor (taking Directory/String/File) was deprecated and you now have to pass an already open IndexReader to the constructor. In 4.x this deprecated stuff was finally removed and IndexSearcher is only a thin wrapper around IndexReader, so you are responsible to open/close the IndexReader, IndexSearcher no longer does this. Your try-finally block must be around IndexReader. But please note: Keep IndexReader open as long as possible as it is very expensive to open/close them all the time. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Thursday, April 04, 2013 4:15 AM To: java-user@lucene.apache.org Subject: Necessary to close() IndexSearcher in 4.X? Hi, I am encountering many situations where searcher.close() is present in finally blocks such as } finally { if (searcher != null) { try { searcher.close(); } catch (Exception ignore) { } searcher = null; } } Is some similar implementation still necessary in the 4.X API? Thank you very much Lewis -- *Lewis* - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Necessary to close() IndexSearcher in 4.X?
Thanks for feeback Uwe. I'll not be looking at this until again tomorrow so at least this gives me time to think it through. On Wednesday, April 3, 2013, Uwe Schindler u...@thetaphi.de wrote: Hi, In Lucene before 4.0 there was a close method in IndexSearcher, because you were able to create IndexSearcher using Directory, which internally opened an IndexReader. This IndexReader had to be closed, so there was a need for IndexSearcher.close(). In 3.x this was constructor (taking Directory/String/File) was deprecated and you now have to pass an already open IndexReader to the constructor. In 4.x this deprecated stuff was finally removed and IndexSearcher is only a thin wrapper around IndexReader, so you are responsible to open/close the IndexReader, IndexSearcher no longer does this. Your try-finally block must be around IndexReader. But please note: Keep IndexReader open as long as possible as it is very expensive to open/close them all the time. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Thursday, April 04, 2013 4:15 AM To: java-user@lucene.apache.org Subject: Necessary to close() IndexSearcher in 4.X? Hi, I am encountering many situations where searcher.close() is present in finally blocks such as } finally { if (searcher != null) { try { searcher.close(); } catch (Exception ignore) { } searcher = null; } } Is some similar implementation still necessary in the 4.X API? Thank you very much Lewis -- *Lewis* - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- *Lewis*
Necessary to close() IndexSearcher in 4.X?
Hi, I am encountering many situations where searcher.close() is present in finally blocks such as } finally { if (searcher != null) { try { searcher.close(); } catch (Exception ignore) { } searcher = null; } } Is some similar implementation still necessary in the 4.X API? Thank you very much Lewis -- *Lewis*
Re: How to get field names and types from an IndexSearcher
Just for the record, the solution that I adopted is as follows: - Create a setType(String field, String type) and call it for any known numeric fields, before adding any document. This method saves the type definition in a file and also sets the MapString,NumericConfig that is used in the method StandardQueryParser.setNumericConfigMap(). This method has the drawback that the types must be known in advance (that is, it is a schema), but it's more robust the guessing the types from the documents itself (as my initial request implied). Kind regards, Rolf. On Fri, Feb 1, 2013 at 3:22 PM, Rolf Veen rolf.v...@gmail.com wrote: On Fri, Feb 1, 2013 at 12:43 PM, Michael McCandless luc...@mikemccandless.com wrote: There is actually one way to check if a field was indexed numerically: you can seek to the first term in the field, and attempt to parse it as a long/float/etc., and if that throws a NumberFormatException, it was indexed numerically. Ie, numeric fields are indexed using the formats from oal.util.NumericUtils, which will not parse as normal numbers. This is what Lucene's FieldCache does to check how to decode numeric values when uninverting ... Very good info. Thank you, Mike. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to get field names and types from an IndexSearcher
On Thu, Jan 31, 2013 at 9:55 PM, Michael McCandless luc...@mikemccandless.com wrote: But are you wanting to, eg, make a NumericRangeQuery if you detect the field was indexed numerically, and otherwise a TermRangeQuery, or something...? (Not easy) This is what I want, yes. But I begin to understand that this is not possible without storing additional meta-data as neither the index nor the documents preserve the type info (correct me if I'm wrong). On the other hand, since once a field (name) has been typified (by auto-detection or a configuration file), in my case the field will maintain its type across documents and can thus be an index property. And since auto-detection is not very robust, I think I'll end needing a schema or type definition after all (field name to type mapping), which is not difficult to implement (or use Solr, I guess). Kind regards, Rolf - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to get field names and types from an IndexSearcher
Getting the FieldInfos from each AtomicReader is the right approach! But, FieldInfos won't tell you which XXXField class was used for the indexing: that information is not fully preserved ... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 31, 2013 at 6:33 AM, Rolf Veen rolf.v...@gmail.com wrote: Hello, all. I want to get a list of field names and types out of an IndexSearcher or IndexReader (not necesarily Atomic). By type I mean if it was stored as StringField, LongField, etc. Is this possible ? I could get the field names this way, probably not the simplest one to get a unified field list: IndexReader reader = searcher.getIndexReader(); for (AtomicReaderContext rc : reader.leaves()) { AtomicReader ar = rc.reader(); FieldInfos fis = ar.getFieldInfos(); for (FieldInfo fi : fis) System.out.println(fi.name); } Kind regards, Rolf. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to get field names and types from an IndexSearcher
Thank you, Mike. I didn't state why I need this. I want to be able to send a query to some QueryParser that understands field:1 regardless if 'field' was added as StringField or LongField, for example. I do not want to rely on schema information if I can avoid it, and rather use a smart QueryParser. What would be the best approach to implement this ? Kind regards, Rolf. On Thu, Jan 31, 2013 at 1:07 PM, Michael McCandless luc...@mikemccandless.com wrote: Getting the FieldInfos from each AtomicReader is the right approach! But, FieldInfos won't tell you which XXXField class was used for the indexing: that information is not fully preserved ... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 31, 2013 at 6:33 AM, Rolf Veen rolf.v...@gmail.com wrote: Hello, all. I want to get a list of field names and types out of an IndexSearcher or IndexReader (not necesarily Atomic). By type I mean if it was stored as StringField, LongField, etc. Is this possible ? I could get the field names this way, probably not the simplest one to get a unified field list: IndexReader reader = searcher.getIndexReader(); for (AtomicReaderContext rc : reader.leaves()) { AtomicReader ar = rc.reader(); FieldInfos fis = ar.getFieldInfos(); for (FieldInfo fi : fis) System.out.println(fi.name); } Kind regards, Rolf. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to get field names and types from an IndexSearcher
On Thu, Jan 31, 2013 at 7:31 AM, Rolf Veen rolf.v...@gmail.com wrote: Thank you, Mike. I didn't state why I need this. I want to be able to send a query to some QueryParser that understands field:1 regardless if 'field' was added as StringField or LongField, for example. I do not want to rely on schema information if I can avoid it, and rather use a smart QueryParser. What would be the best approach to implement this ? But are you wanting to, eg, make a NumericRangeQuery if you detect the field was indexed numerically, and otherwise a TermRangeQuery, or something...? (Not easy) Or you just want to recognize valid fields vs invalid ones? (Easy) Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: How to properly refresh MultiReader IndexSearcher in Lucene 4.0-BETA
Hi, your code unfortunatley will no longer work in later Lucene 4.0 releases. In general the simpliest and correct way to do this is: - Manage your DirectoryReaders completely separate from each other in something like a pool of subindex readers (e.g. use some too like SearcherManager to keep the alive, this is much easier than doing it yourself). Once you need to reopen one, just reopen it and save it. - On *every* search create a new MultiReader() [this costs nothing, as it is just a wrapper] and wrap it with a new IndexSearcher [this also costs you nothing, as it is also just a wrapper]. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mossaab Bagdouri [mailto:bagdouri_moss...@yahoo.fr] Sent: Monday, August 27, 2012 7:37 PM To: java-user@lucene.apache.org Subject: How to properly refresh MultiReader IndexSearcher in Lucene 4.0- BETA Hi, The context is that I've migrated from Lucene 3.6 to Lucene 4.0-BETA. Lucene 3.6 had the convenient method IndexSearcher.isCurrent() for any underlying IndexReader, including MultiReader. This is no more the case for Lucene 4.0-BETA. I've been suffering in the last 48h until I came up with this solution. I just want to share, and get feedbacks if any. The idea is to create a new instance of MultiReader, add the old current SubReaders and the new changed ones, refresh the IndexSearcher, then close the old out-of-date SubReaders. private IndexSearcher getIndexSearcher() { try { if (is == null || is.getIndexReader().getRefCount() == 0) { DirectoryReader newReaders[] = new DirectoryReader[2]; for (int i = 0; i 2; i++) { newReaders[i] = DirectoryReader.open(MyFSDirectories.get(i)); } is = new IndexSearcher(new MultiReader(newReaders)); } else { MultiReader mr = (MultiReader) is.getIndexReader(); ListDirectoryReader oldReaders = (ListDirectoryReader) mr.getSequentialSubReaders(); DirectoryReader newReaders[] = new DirectoryReader[oldReaders.size()]; SetInteger toClose = new HashSet(); for (int i = 0; i oldReaders.size(); i++) { DirectoryReader oldDirectoryReader = oldReaders.get(i); if (oldDirectoryReader.isCurrent()) { newReaders[i] = oldDirectoryReader; } else { toClose.add(i); newReaders[i] = DirectoryReader.openIfChanged(oldReaders.get(i)); } } is = new IndexSearcher(new MultiReader(newReaders)); for (int i : toClose) { oldReaders.get(i).close(); } } } catch (Exception e) { e.printStackTrace(); } return is; } Regards, Mossaab - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
On Sun, Feb 19, 2012 at 10:39 PM, Trejkaz trej...@trypticon.org wrote: On Mon, Feb 20, 2012 at 12:07 PM, Uwe Schindler u...@thetaphi.de wrote: See my response. The problem is not in Lucene; its in general a problem of fixed thread pools that execute other callables from within a callable running at the moment in the same thread pool. Callables are simply waiting for each other. What we do to get around this issue is to have a utility class which you call to submit jobs to the executor, but instead of waiting after submitting them, it starts calling get() starting from the end of the list. So if there is no other thread available on the executor, the main thread ends up doing all the work and then returns like normal. The problem with this solution is that it requires all code in the system to go through this utility to avoid the issue, and obviously Lucene is one of those things which isn't written to defend against this. Java 7's solution seems to be ForkJoinPool but I gather there is no simple way to use that with Lucene... I take it that a pool which rejects too much work (instead of blocking for a slot) is just as bad from a Lucene standpoint. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Hanging with fixed thread pool in the IndexSearcher multithread code
3.5.0: I passed a fixed size executor service with one thread, and then with two threads, to the IndexSearcher constructor. It hung. With three threads, it didn't work, but I got different results than when I don't pass in an executor service at all. Is this expected? Should the javadoc say something? (I can make a patch). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
On Sun, Feb 19, 2012 at 9:08 AM, Benson Margulies bimargul...@gmail.com wrote: 3.5.0: I passed a fixed size executor service with one thread, and then with two threads, to the IndexSearcher constructor. It hung. With three threads, it didn't work, but I got different results than when I don't pass in an executor service at all. Is this expected? Should the javadoc say something? (I can make a patch). I'm not sure I understand the details here, but I don't like the sound of 'different results': is it possible you can work this down into a test case that can be attached to jira? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
I should have been clearer; the hang I can make into a test case but I wondered if is would just get closed as 'works as designed'. the result discrepancy needs some investigation, I should not have mentioned it yet. On Feb 19, 2012, at 10:40 AM, Robert Muir rcm...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:08 AM, Benson Margulies bimargul...@gmail.com wrote: 3.5.0: I passed a fixed size executor service with one thread, and then with two threads, to the IndexSearcher constructor. It hung. With three threads, it didn't work, but I got different results than when I don't pass in an executor service at all. Is this expected? Should the javadoc say something? (I can make a patch). I'm not sure I understand the details here, but I don't like the sound of 'different results': is it possible you can work this down into a test case that can be attached to jira? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
and there was a dumb typo. 1 thread: hang 2 threads: hang 3 or more: no hang On Feb 19, 2012, at 10:40 AM, Robert Muir rcm...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:08 AM, Benson Margulies bimargul...@gmail.com wrote: 3.5.0: I passed a fixed size executor service with one thread, and then with two threads, to the IndexSearcher constructor. It hung. With three threads, it didn't work, but I got different results than when I don't pass in an executor service at all. Is this expected? Should the javadoc say something? (I can make a patch). I'm not sure I understand the details here, but I don't like the sound of 'different results': is it possible you can work this down into a test case that can be attached to jira? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
Conveniently, all the 'wrong-result' problems disappeared when I followed your advice about counting hits. On Sun, Feb 19, 2012 at 10:39 AM, Robert Muir rcm...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:08 AM, Benson Margulies bimargul...@gmail.com wrote: 3.5.0: I passed a fixed size executor service with one thread, and then with two threads, to the IndexSearcher constructor. It hung. With three threads, it didn't work, but I got different results than when I don't pass in an executor service at all. Is this expected? Should the javadoc say something? (I can make a patch). I'm not sure I understand the details here, but I don't like the sound of 'different results': is it possible you can work this down into a test case that can be attached to jira? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
See https://issues.apache.org/jira/browse/LUCENE-3803 for an example of the hang. I think this nets out to pilot error, but maybe Javadoc could protect the next person from making the same mistake. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Hanging with fixed thread pool in the IndexSearcher multithread code
See my response. The problem is not in Lucene; its in general a problem of fixed thread pools that execute other callables from within a callable running at the moment in the same thread pool. Callables are simply waiting for each other. Use a separate thread pool for Lucene (or whenever you execute new callables from within another running callable) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Monday, February 20, 2012 1:47 AM To: java-user@lucene.apache.org Subject: Re: Hanging with fixed thread pool in the IndexSearcher multithread code See https://issues.apache.org/jira/browse/LUCENE-3803 for an example of the hang. I think this nets out to pilot error, but maybe Javadoc could protect the next person from making the same mistake. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
On Sun, Feb 19, 2012 at 8:07 PM, Uwe Schindler u...@thetaphi.de wrote: See my response. The problem is not in Lucene; its in general a problem of fixed thread pools that execute other callables from within a callable running at the moment in the same thread pool. Callables are simply waiting for each other. Use a separate thread pool for Lucene (or whenever you execute new callables from within another running callable) Right. There's nothing like coding a test case to cast one's stupid errors into high relief. Sorry for all the noise. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Benson Margulies [mailto:bimargul...@gmail.com] Sent: Monday, February 20, 2012 1:47 AM To: java-user@lucene.apache.org Subject: Re: Hanging with fixed thread pool in the IndexSearcher multithread code See https://issues.apache.org/jira/browse/LUCENE-3803 for an example of the hang. I think this nets out to pilot error, but maybe Javadoc could protect the next person from making the same mistake. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Hanging with fixed thread pool in the IndexSearcher multithread code
On Mon, Feb 20, 2012 at 12:07 PM, Uwe Schindler u...@thetaphi.de wrote: See my response. The problem is not in Lucene; its in general a problem of fixed thread pools that execute other callables from within a callable running at the moment in the same thread pool. Callables are simply waiting for each other. What we do to get around this issue is to have a utility class which you call to submit jobs to the executor, but instead of waiting after submitting them, it starts calling get() starting from the end of the list. So if there is no other thread available on the executor, the main thread ends up doing all the work and then returns like normal. The problem with this solution is that it requires all code in the system to go through this utility to avoid the issue, and obviously Lucene is one of those things which isn't written to defend against this. Java 7's solution seems to be ForkJoinPool but I gather there is no simple way to use that with Lucene... TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Filter and IndexSearcher in Lucene 4.0 (trunk)
Hi, I apologise upfront for the trivial question. I have an IndexSearcher and I am applying a FieldCacheTermsFilter filter on it to only retrieve documents whose single docId is in a provided set of allowed docIds. I am particularly interested in the stats being estimated over the accepted set of documents. However, the filtering is not working. Am I missing something here? h. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Filter and IndexSearcher in Lucene 4.0 (trunk)
Whats the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Hany Azzam [mailto:h...@eecs.qmul.ac.uk] Sent: Friday, February 10, 2012 6:43 PM To: java-user@lucene.apache.org Subject: Re: Filter and IndexSearcher in Lucene 4.0 (trunk) Hi, I apologise upfront for the trivial question. I have an IndexSearcher and I am applying a FieldCacheTermsFilter filter on it to only retrieve documents whose single docId is in a provided set of allowed docIds. I am particularly interested in the stats being estimated over the accepted set of documents. However, the filtering is not working. Am I missing something here? h. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Filter and IndexSearcher in Lucene 4.0 (trunk)
See the question was so trivial that you actually missed it :) The problem is that the docs are filtered (which is is great) but the stats (BasicStats) aren't, i.e. the stats have been calculated over the whole index and not just a selected set of documents. For example: Filter filter = new FieldCacheTermsFilter(QNO, queryNumber); searcher.search(qq, filter, collector); stats.getNumberOfDocuments(); I only want to consider certain docs per query. The filter achieves that in terms of matching and the returned the results. However, the calculated score for each document has been calculated using the stats over the whole index and not just the filtered documents. Is there a way to calculate the stats only over the filtered documents? I hope the problem is a bit clearer now. Thank you. h. On 10 Feb 2012, at 18:27, Uwe Schindler wrote: Whats the problem? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Hany Azzam [mailto:h...@eecs.qmul.ac.uk] Sent: Friday, February 10, 2012 6:43 PM To: java-user@lucene.apache.org Subject: Re: Filter and IndexSearcher in Lucene 4.0 (trunk) Hi, I apologise upfront for the trivial question. I have an IndexSearcher and I am applying a FieldCacheTermsFilter filter on it to only retrieve documents whose single docId is in a provided set of allowed docIds. I am particularly interested in the stats being estimated over the accepted set of documents. However, the filtering is not working. Am I missing something here? h. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher with two Indexes
Hi, I have two indexes. One that contains all the documents in the collection and the other contains only the relevant documents. I am using Lucene 4.0 and the new SimilariyBase class to build my retrieval models (similarity functions). One of the retrieval models requires statistics to be computed across both of the indexes. How can an IndexSearcher use the two indexes at the same time to compute different components of the retrieval model? Is that possible? Thank you very much, Hany
Re: IndexSearcher with two Indexes
On Fri, Jan 27, 2012 at 3:21 PM, Hany Azzam h...@eecs.qmul.ac.uk wrote: Hi, I have two indexes. One that contains all the documents in the collection and the other contains only the relevant documents. I am using Lucene 4.0 and the new SimilariyBase class to build my retrieval models (similarity functions). One of the retrieval models requires statistics to be computed across both of the indexes. How can an IndexSearcher use the two indexes at the same time to compute different components of the retrieval model? Is that possible? you can make a multireader over the two indexreaders, then make an indexsearcher over that multireader... or are you trying to do something else? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher with two Indexes
Hi Robert, Thanks for the reply. I am trying to do something different. If I use a mutireader then the searching/scoring will take place over the two indexes at the same time. However, in my case the subcomponents of the retrieval model are calculated over separate evidence spaces. For example, the retrieval model calculates something like that: score := P(query_term | documents) * P(query_term | relevant_documents) The P(query_term | documents) can be estimated using the index over the whole collection of documents. The P(query_term | relevant_documents) can be estimated using the index over the relevant documents only (which are known prior to the execution of the query). The question is can I do such a calculation which uses to separate indexes in one scoring function? Of course one option is to use the MultiSimilarity Class and combine the score somehow. However, the retrieval function is more complex than that and a simple combination using product or summation won't be feasible. Any ideas on how to resolve this problem (if possible :))? Thanks again, h. On 27 Jan 2012, at 20:29, Robert Muir wrote: On Fri, Jan 27, 2012 at 3:21 PM, Hany Azzam h...@eecs.qmul.ac.uk wrote: Hi, I have two indexes. One that contains all the documents in the collection and the other contains only the relevant documents. I am using Lucene 4.0 and the new SimilariyBase class to build my retrieval models (similarity functions). One of the retrieval models requires statistics to be computed across both of the indexes. How can an IndexSearcher use the two indexes at the same time to compute different components of the retrieval model? Is that possible? you can make a multireader over the two indexreaders, then make an indexsearcher over that multireader... or are you trying to do something else? -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher with two Indexes
On Fri, Jan 27, 2012 at 4:53 PM, Hany Azzam h...@eecs.qmul.ac.uk wrote: Hi Robert, Thanks for the reply. I am trying to do something different. If I use a mutireader then the searching/scoring will take place over the two indexes at the same time. However, in my case the subcomponents of the retrieval model are calculated over separate evidence spaces. For example, the retrieval model calculates something like that: score := P(query_term | documents) * P(query_term | relevant_documents) The P(query_term | documents) can be estimated using the index over the whole collection of documents. The P(query_term | relevant_documents) can be estimated using the index over the relevant documents only (which are known prior to the execution of the query). In this situation, if you want to combine the statistics from different indexes in your own way, you can look at IndexSearcher.termStatistics() and IndexSearcher.collectionStatistics(). These are intended for situations like distributed search, but maybe you can make use of them. here is some pseudocode: IndexReader relevant = IndexReader.open(relevantDirectory); IndexReader documents = IndexReader.open(documentsDirectory); final IndexSearcher relevantSearcher = new IndexSearcher(relevant); IndexSearcher documentsSearcher = new IndexSearcher(documents) { @Override public CollectionStatistics collectionStatistics(String field) throws IOException { CollectionStatistics documentStats = super.collectionStatistics(field); return new CollectionStatistics(... someCombinationOf(documentStats + stuff from relevantSearcher)); } // do a similar thing for termStatistics() }; documentsSearcher.search(...) -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: RE: Question about FilterIndexReader and IndexSearcher
Hi, I'am a student of Southeast University which locate in China, thank you for your help,but i still cann't filter the docs being deleted,i make a test demo,please tell me why the following procedure will be such a result? Why would IndexSearcher ignore the deleted docs cached in FilterIndexReader? zhouzhou 2011-06-27 发件人: Uwe Schindler 发送时间: 2011-06-26 19:05:11 收件人: java-user@lucene.apache.org 抄送: 主题: RE: Question about FilterIndexReader and IndexSearcher Hi, usage of FilterIndexReader is not always as easy as it seems. There are several problem, that can easy lead to the fact that you FilterIndexReader implements all document filtering, but IndexSearcher does not respect it. I have no idea what you are doing, but the following thing need to be done to correcty filter documents: - FilterIndexReader should implement isDeleted() methods co (I assume you did this) - FilterIndexReader should filter postings returned: termPositions(...) and termDocs(...) to exclude deleted documents - return the correct numer for numDocs() The biggest problem since Lucene 2.9 is one specific method that will circumvent all you had done above: getSequentialSubReaders() is used by IndexSearcher to directly pass the searches to all atomic segments of a MultiReader/DirectoryReader structure. As the subreaders returned by this method do not implement the above (they are passed as is by the default impl), IndexSearcher will in fact only talk to them and so ignore the above methods on the top-level reader To do this correct do one of the following: - easy: override getSequentialSubReaders() to return null, this will make the filtered IndexReader itself atomic, so IndexSearcher will use it during search. The backside: searches may get significantly slower - override getSequentialSubReaders() and also wrap each subreader returned by the delegate reader with your impl. If you implement the last option (but also the return-null option) you may also override reopen(), to correctly wrap reopened segments - you need to do this if you use reopen. If you are already using Lucene trunk (coming version 4.0), you can follow the following issue: https://issues.apache.org/jira/browse/LUCENE-3212 It will implement exactly the above once I have time to do it finally. I will post a first patch soon. This version will not work with Lucene 3.x, as it is lots of work to get all this running easily with Lucene 3.x (especially the above termPositions, termDocs mehods). In Lucene 4.0 the filtering of documents is much easier, you only have to override getDeletedDocs() and numDocs(), everything else is automatically handled! Hope that helps. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: 周洲 [mailto:zhou518z...@gmail.com] Sent: Sunday, June 26, 2011 7:08 AM To: java-user Subject: Question about FilterIndexReader and IndexSearcher Hello, I want to let IndexReader finding the modification in time,so i use MyFilterIndexReader which extend FilterIndexReader to cache the deleted document in RAM.when this FilterIndexReader be the argument of a IndexSearcher,i found that this IndexSearcher can not filter the deleted document,so i want to know how IndexSearcher and FilterIndexReader be used can deleted documents filtered? zhouzhou -- 2011-06-26 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Question about FilterIndexReader and IndexSearcher
Hi, usage of FilterIndexReader is not always as easy as it seems. There are several problem, that can easy lead to the fact that you FilterIndexReader implements all document filtering, but IndexSearcher does not respect it. I have no idea what you are doing, but the following thing need to be done to correcty filter documents: - FilterIndexReader should implement isDeleted() methods co (I assume you did this) - FilterIndexReader should filter postings returned: termPositions(...) and termDocs(...) to exclude deleted documents - return the correct numer for numDocs() The biggest problem since Lucene 2.9 is one specific method that will circumvent all you had done above: getSequentialSubReaders() is used by IndexSearcher to directly pass the searches to all atomic segments of a MultiReader/DirectoryReader structure. As the subreaders returned by this method do not implement the above (they are passed as is by the default impl), IndexSearcher will in fact only talk to them and so ignore the above methods on the top-level reader To do this correct do one of the following: - easy: override getSequentialSubReaders() to return null, this will make the filtered IndexReader itself atomic, so IndexSearcher will use it during search. The backside: searches may get significantly slower - override getSequentialSubReaders() and also wrap each subreader returned by the delegate reader with your impl. If you implement the last option (but also the return-null option) you may also override reopen(), to correctly wrap reopened segments - you need to do this if you use reopen. If you are already using Lucene trunk (coming version 4.0), you can follow the following issue: https://issues.apache.org/jira/browse/LUCENE-3212 It will implement exactly the above once I have time to do it finally. I will post a first patch soon. This version will not work with Lucene 3.x, as it is lots of work to get all this running easily with Lucene 3.x (especially the above termPositions, termDocs mehods). In Lucene 4.0 the filtering of documents is much easier, you only have to override getDeletedDocs() and numDocs(), everything else is automatically handled! Hope that helps. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: 周洲 [mailto:zhou518z...@gmail.com] Sent: Sunday, June 26, 2011 7:08 AM To: java-user Subject: Question about FilterIndexReader and IndexSearcher Hello, I want to let IndexReader finding the modification in time,so i use MyFilterIndexReader which extend FilterIndexReader to cache the deleted document in RAM.when this FilterIndexReader be the argument of a IndexSearcher,i found that this IndexSearcher can not filter the deleted document,so i want to know how IndexSearcher and FilterIndexReader be used can deleted documents filtered? zhouzhou -- 2011-06-26 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Question about FilterIndexReader and IndexSearcher
Hello, I want to let IndexReader finding the modification in time,so i use MyFilterIndexReader which extend FilterIndexReader to cache the deleted document in RAM.when this FilterIndexReader be the argument of a IndexSearcher,i found that this IndexSearcher can not filter the deleted document,so i want to know how IndexSearcher and FilterIndexReader be used can deleted documents filtered? zhouzhou -- 2011-06-26
Question about FilterIndexReader and IndexSearcher
Hello, I want to let IndexReader finding the modification in time,so i use MyFilterIndexReader which extend FilterIndexReader to cache the deleted document in RAM.when this FilterIndexReader be the argument of a IndexSearcher,i found that this IndexSearcher can not filter the deleted document,so i want to know how IndexSearcher and FilterIndexReader be used can deleted documents filtered? zhouzhou -- 2011-06-26
Re: Lucene: Indexsearcher: java.lang.UnsupportedOperationException
java.lang.UnsupportedOperationException at org.apache.lucene.search.Query.createWeight(Query.java:88) at org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:185) at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:360) at org.apache.lucene.search.Query.weight(Query.java:95) at org.apache.lucene.search.Searcher.createWeight(Searcher.java:185) at org.apache.lucene.search.Searcher.search(Searcher.java:136) at NVoting.init(NVoting.java:159) at Main.main(Main.java:8) On 20 April 2011 05:25, Anshum ansh...@gmail.com wrote: Could you also print and send the entire stack-trace? Also, the query.toString() -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Apr 19, 2011 at 7:40 PM, Patrick Diviacco patrick.divia...@gmail.com wrote: I get the following error message: java.lang.UnsupportedOperationException with Lucene search method: topDocs = searcher.search(booleanQuery, null, 100); I'm using an old version of Lucene: Lucene 2.4.1 (I cannot upgrade!) Can you help me to understand why I get such error ? thanks This is the complete code: http://pastie.org/1811677
Lucene: Indexsearcher: java.lang.UnsupportedOperationException
I get the following error message: java.lang.UnsupportedOperationException with Lucene search method: topDocs = searcher.search(booleanQuery, null, 100); I'm using an old version of Lucene: Lucene 2.4.1 (I cannot upgrade!) Can you help me to understand why I get such error ? thanks This is the complete code: http://pastie.org/1811677
Re: Lucene: Indexsearcher: java.lang.UnsupportedOperationException
Could you also print and send the entire stack-trace? Also, the query.toString() -- Anshum Gupta http://ai-cafe.blogspot.com On Tue, Apr 19, 2011 at 7:40 PM, Patrick Diviacco patrick.divia...@gmail.com wrote: I get the following error message: java.lang.UnsupportedOperationException with Lucene search method: topDocs = searcher.search(booleanQuery, null, 100); I'm using an old version of Lucene: Lucene 2.4.1 (I cannot upgrade!) Can you help me to understand why I get such error ? thanks This is the complete code: http://pastie.org/1811677
IndexSearcher Single Instance Bottleneck?
I currently have two types of searches on my website that are using the same index and same instance of index searcher. One of the searches usually only takes 50 - 100 milliseconds but the second usually takes 2 seconds. It seems as though when someone does the second search and another user does the first search immediately after the first search will wait for the second to complete. Is that how Lucene works or am I just looking at my test wrong. If so how should i solve this issue? Two indexes or two index searchers? -- View this message in context: http://lucene.472066.n3.nabble.com/IndexSearcher-Single-Instance-Bottleneck-tp2662376p2662376.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher Single Instance Bottleneck?
No, Lucene itself shouldn't be doing this, the recommendation is for multiple threads to share a single searcher. I'd first look upstream, are your requests being processed serially? I.e. is there a single thread that's handling requests? Best Erick On Thu, Mar 10, 2011 at 4:25 PM, RobM rmcclana...@databanq.com wrote: I currently have two types of searches on my website that are using the same index and same instance of index searcher. One of the searches usually only takes 50 - 100 milliseconds but the second usually takes 2 seconds. It seems as though when someone does the second search and another user does the first search immediately after the first search will wait for the second to complete. Is that how Lucene works or am I just looking at my test wrong. If so how should i solve this issue? Two indexes or two index searchers? -- View this message in context: http://lucene.472066.n3.nabble.com/IndexSearcher-Single-Instance-Bottleneck-tp2662376p2662376.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
shared IndexSearcher (lucene 3.0.3)
Hi all, in our project we're using lucene in tomcat. To avoid some overhead we have a shared IndexSearcher instance. In the past we had too many open files errors many times. To prevent this the IndexSearcher is closed and reopened after indexing. The shared instance is not closed anywhere else in the code. Is this the right way of preventing these kind of errors? Thanks in advance for your answers, Ákos Tajti
Re: shared IndexSearcher (lucene 3.0.3)
Hey, the too many open files can be prevented by raising the limit of open files ;) there is a nice summary on the FAQ you might wanna look at: http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F if you have further questions just come back here! Simon On Fri, Feb 25, 2011 at 2:11 PM, Akos Tajti akos.ta...@gmail.com wrote: Hi all, in our project we're using lucene in tomcat. To avoid some overhead we have a shared IndexSearcher instance. In the past we had too many open files errors many times. To prevent this the IndexSearcher is closed and reopened after indexing. The shared instance is not closed anywhere else in the code. Is this the right way of preventing these kind of errors? Thanks in advance for your answers, Ákos Tajti - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Newbie: Life span of IndexWriter / IndexSearcher?
Look at the JavaDoc: http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#reopen() The *reopen* method returns a *new reader* if the index has changed since the original reader was opened. So, you should do something like this: IndexReader newReader = reader.reopen(true); if (newReader != reader) { reader.close(); reader = newReader; searcher = new IndexSearcher(reader); } instead of reader.reopen(true); Bye. *Raf* On Sun, Jan 16, 2011 at 11:06 AM, sol myr solmy...@yahoo.com wrote: Hi, Thank you kindly for replying. Unfortunately, reopen() doesn't help me see the changes. Here's my test: First I write commit a document, and run a search - which correctly finds this document. Then I write commit another document, re-open the reader and run another search - this should find 2 documents, but it only finds 1 document (the first one). BTW if instead of 'reader.reopen()' I instantiate a brand-new searcher (and reader), it correctly finds 2 documents... // Shared objects: Directory directory = FSDirectory.open(new File(c:/myDir)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.LIMITED); Query query = new TermQuery(new Term(title, hello)); // Write document #1: writer.addDocument(makeDoc(hello world 1)); // Field title=hello world 1 writer.commit(); // First search (yields document #1 as expected): IndexReader reader=IndexReader.open(directory, true); IndexSearcher searcher = new IndexSearcher(reader); TopDocs results1 = searcher.search(query, 1); printResults(searcher, results1); // Write document #2: writer.addDocument(makeDoc(hello world 2)); // Field title=hello world 2 writer.commit(); // Reopen reader, and search (should yield 2 documents, but I only see 1): reader.reopen(true); TopDocs results2 = searcher.search(query, 1); printResults(searcher, results2); --- On Thu, 1/13/11, Uwe Schindler u...@thetaphi.de wrote: From: Uwe Schindler u...@thetaphi.de Subject: RE: Newbie: Life span of IndexWriter / IndexSearcher? To: java-user@lucene.apache.org Date: Thursday, January 13, 2011, 7:40 AM You can leave the IndexWriter and IndexSearcher all the time. The only important thing, changes made by IndexWriter's commit() method are only seen by IndexSearcher, when the underlying IndexReader is reopened (e.g. by using IndexReader.reopen()) - please note that this only works with direct access to the IndexReaders, so I would recommend using the constructors of IndexSearcher that take IndexReaders (the Directory ones are only for easy beginner's use).
Re: Newbie: Life span of IndexWriter / IndexSearcher?
Worked like a charm - thanks a lot. --- On Sun, 1/16/11, Raf r.ventag...@gmail.com wrote: From: Raf r.ventag...@gmail.com Subject: Re: Newbie: Life span of IndexWriter / IndexSearcher? To: java-user@lucene.apache.org Date: Sunday, January 16, 2011, 3:16 AM Look at the JavaDoc: http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#reopen() The *reopen* method returns a *new reader* if the index has changed since the original reader was opened. So, you should do something like this: IndexReader newReader = reader.reopen(true); if (newReader != reader) { reader.close(); reader = newReader; searcher = new IndexSearcher(reader); } instead of reader.reopen(true); Bye. *Raf* On Sun, Jan 16, 2011 at 11:06 AM, sol myr solmy...@yahoo.com wrote: Hi, Thank you kindly for replying. Unfortunately, reopen() doesn't help me see the changes. Here's my test: First I write commit a document, and run a search - which correctly finds this document. Then I write commit another document, re-open the reader and run another search - this should find 2 documents, but it only finds 1 document (the first one). BTW if instead of 'reader.reopen()' I instantiate a brand-new searcher (and reader), it correctly finds 2 documents... // Shared objects: Directory directory = FSDirectory.open(new File(c:/myDir)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.LIMITED); Query query = new TermQuery(new Term(title, hello)); // Write document #1: writer.addDocument(makeDoc(hello world 1)); // Field title=hello world 1 writer.commit(); // First search (yields document #1 as expected): IndexReader reader=IndexReader.open(directory, true); IndexSearcher searcher = new IndexSearcher(reader); TopDocs results1 = searcher.search(query, 1); printResults(searcher, results1); // Write document #2: writer.addDocument(makeDoc(hello world 2)); // Field title=hello world 2 writer.commit(); // Reopen reader, and search (should yield 2 documents, but I only see 1): reader.reopen(true); TopDocs results2 = searcher.search(query, 1); printResults(searcher, results2); --- On Thu, 1/13/11, Uwe Schindler u...@thetaphi.de wrote: From: Uwe Schindler u...@thetaphi.de Subject: RE: Newbie: Life span of IndexWriter / IndexSearcher? To: java-user@lucene.apache.org Date: Thursday, January 13, 2011, 7:40 AM You can leave the IndexWriter and IndexSearcher all the time. The only important thing, changes made by IndexWriter's commit() method are only seen by IndexSearcher, when the underlying IndexReader is reopened (e.g. by using IndexReader.reopen()) - please note that this only works with direct access to the IndexReaders, so I would recommend using the constructors of IndexSearcher that take IndexReaders (the Directory ones are only for easy beginner's use).
Re: Can not delete index file after close the IndexSearcher
Try adding try { searcher.close(); } catch (Exception e) { } before searcher = new IndexSearcher(dir); at the top of the loop. At the end of a loop searcher is open, and is not closed before being reassigned. There is probably a better solution along the lines of only opening new searcher if need to. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath()); } } // Copy index File from another folder to this folder copyDir(new File(backupIndexpath), newFile); Directory newDir = FSDirectory.open(newFile); IndexSearcher newSearcher = new IndexSearcher(newDir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if (br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if (indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void copyDir(File sourceLocation, File targetLocation) throws IOException { String[] children = sourceLocation.list(); for (int i = 0; i children.length; i++) { InputStream in = null; OutputStream out = null
Re: Can not delete index file after close the IndexSearcher
Ian, thanks for your response. Your suggestion worked for me. What does oldSearcher.close() do in my code? why I have to close the searcher and oldSearcher together? In my opinion, oldSearcher held index1 while searcher held index2, they are using different resources, the resources held by them should be released seperately. I have another concern for your solution, searcher is a reference created here for user searching out of this code snippet, if I closed and reopen it here, there may be some service down time because there is no open searcher for using. In my original code, searcher opened all the time, so there is no service down time or little, this is the reason I did not close it every time. Do you have any suggestion to keep an alive searcher and the program can also switch the index smoothly? Thanks, Garry 在 2011年1月13日 下午5:47,Ian Lea ian@gmail.com写道: Try adding try { searcher.close(); } catch (Exception e) { } before searcher = new IndexSearcher(dir); at the top of the loop. At the end of a loop searcher is open, and is not closed before being reassigned. There is probably a better solution along the lines of only opening new searcher if need to. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath()); } } // Copy index File from another folder to this folder copyDir(new File(backupIndexpath), newFile); Directory newDir = FSDirectory.open(newFile); IndexSearcher newSearcher = new IndexSearcher(newDir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in
Re: Can not delete index file after close the IndexSearcher
As I said, there is probably a better solution. At the moment you are opening searchers at the top and bottom of the loop and on second and subsequent passes you are not closing the bottom one, that you've only just opened, before opening a new one using the same instance variable. The resources of the bottom one would presumably be released eventually by GC, but evidently not soon enough, Replace the top searcher = new IndexSearcher(dir); line with if (needToOpenNewSearcher()) { ... } where the logic in needToOpenNewSearcher() is for you to write. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Ian, thanks for your response. Your suggestion worked for me. What does oldSearcher.close() do in my code? why I have to close the searcher and oldSearcher together? In my opinion, oldSearcher held index1 while searcher held index2, they are using different resources, the resources held by them should be released seperately. I have another concern for your solution, searcher is a reference created here for user searching out of this code snippet, if I closed and reopen it here, there may be some service down time because there is no open searcher for using. In my original code, searcher opened all the time, so there is no service down time or little, this is the reason I did not close it every time. Do you have any suggestion to keep an alive searcher and the program can also switch the index smoothly? Thanks, Garry 在 2011年1月13日 下午5:47,Ian Lea ian@gmail.com写道: Try adding try { searcher.close(); } catch (Exception e) { } before searcher = new IndexSearcher(dir); at the top of the loop. At the end of a loop searcher is open, and is not closed before being reassigned. There is probably a better solution along the lines of only opening new searcher if need to. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath
Re: Can not delete index file after close the IndexSearcher
Ian, thank you very much. I will try to change my switch solution. Thanks again Garry 在 2011年1月13日 下午6:41,Ian Lea ian@gmail.com写道: As I said, there is probably a better solution. At the moment you are opening searchers at the top and bottom of the loop and on second and subsequent passes you are not closing the bottom one, that you've only just opened, before opening a new one using the same instance variable. The resources of the bottom one would presumably be released eventually by GC, but evidently not soon enough, Replace the top searcher = new IndexSearcher(dir); line with if (needToOpenNewSearcher()) { ... } where the logic in needToOpenNewSearcher() is for you to write. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Ian, thanks for your response. Your suggestion worked for me. What does oldSearcher.close() do in my code? why I have to close the searcher and oldSearcher together? In my opinion, oldSearcher held index1 while searcher held index2, they are using different resources, the resources held by them should be released seperately. I have another concern for your solution, searcher is a reference created here for user searching out of this code snippet, if I closed and reopen it here, there may be some service down time because there is no open searcher for using. In my original code, searcher opened all the time, so there is no service down time or little, this is the reason I did not close it every time. Do you have any suggestion to keep an alive searcher and the program can also switch the index smoothly? Thanks, Garry 在 2011年1月13日 下午5:47,Ian Lea ian@gmail.com写道: Try adding try { searcher.close(); } catch (Exception e) { } before searcher = new IndexSearcher(dir); at the top of the loop. At the end of a loop searcher is open, and is not closed before being reassigned. There is probably a better solution along the lines of only opening new searcher if need to. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath
Re: Can not delete index file after close the IndexSearcher
In fact it's probably as simple as if (searcher == null) { searcher = new IndexSearcher(dir); } at the top of the loop. -- Ian. 2011/1/13 Ian Lea ian@gmail.com: As I said, there is probably a better solution. At the moment you are opening searchers at the top and bottom of the loop and on second and subsequent passes you are not closing the bottom one, that you've only just opened, before opening a new one using the same instance variable. The resources of the bottom one would presumably be released eventually by GC, but evidently not soon enough, Replace the top searcher = new IndexSearcher(dir); line with if (needToOpenNewSearcher()) { ... } where the logic in needToOpenNewSearcher() is for you to write. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Ian, thanks for your response. Your suggestion worked for me. What does oldSearcher.close() do in my code? why I have to close the searcher and oldSearcher together? In my opinion, oldSearcher held index1 while searcher held index2, they are using different resources, the resources held by them should be released seperately. I have another concern for your solution, searcher is a reference created here for user searching out of this code snippet, if I closed and reopen it here, there may be some service down time because there is no open searcher for using. In my original code, searcher opened all the time, so there is no service down time or little, this is the reason I did not close it every time. Do you have any suggestion to keep an alive searcher and the program can also switch the index smoothly? Thanks, Garry 在 2011年1月13日 下午5:47,Ian Lea ian@gmail.com写道: Try adding try { searcher.close(); } catch (Exception e) { } before searcher = new IndexSearcher(dir); at the top of the loop. At the end of a loop searcher is open, and is not closed before being reassigned. There is probably a better solution along the lines of only opening new searcher if need to. -- Ian. 2011/1/13 张志田 zhitian.zh...@dianping.com: Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete
Newbie: Life span of IndexWriter / IndexSearcher?
Hi, We're writing a web application, which naturally needs - IndexSearcher when users use our search screen - IndexWriter in a background process that periodically updates and optimizes our index. Note our writer is exclusive - no other applications/threads ever write to our index files. What's the common practice in terms of resource creation and sharing? Specifically: 1) Should I have a single IndexSearcher to serve all (concurrent) users? I saw such a recommendation in a tutorial, but discovered that an open IndexSearcher prevents 'optimize' from merging my files... so should I close it just before optimization? Or should I open an individual (short-lived) IndexSearcher for each search request? 2) Our tests also imply that IndexWriter.optimize() takes effect only after you close() that writer - which is a shame, because I hoped to keep using the same writer (I hear it's expensive to instantiate). I doing something wrong? Thanks
RE: Newbie: Life span of IndexWriter / IndexSearcher?
Hi, We're writing a web application, which naturally needs - IndexSearcher when users use our search screen - IndexWriter in a background process that periodically updates and optimizes our index. Note our writer is exclusive - no other applications/threads ever write to our index files. What's the common practice in terms of resource creation and sharing? Specifically: 1) Should I have a single IndexSearcher to serve all (concurrent) users? I saw such a recommendation in a tutorial, but discovered that an open IndexSearcher prevents 'optimize' from merging my files... so should I close it just before optimization? Or should I open an individual (short-lived) IndexSearcher for each search request? You can leave the IndexWriter and IndexSearcher all the time. The only important thing, changes made by IndexWriter's commit() method are only seen by IndexSearcher, when the underlying IndexReader is reopened (e.g. by using IndexReader.reopen()) - please note that this only works with direct access to the IndexReaders, so I would recommend using the constructors of IndexSearcher that take IndexReaders (the Directory ones are only for easy beginner's use). See the Lucene In Action 2 for a good example of a Searcher manager. 2) Our tests also imply that IndexWriter.optimize() takes effect only after you close() that writer - which is a shame, because I hoped to keep using the same writer (I hear it's expensive to instantiate). I doing something wrong? This is wrong, see above. As the IndexReader/Searcher keeps the used segments from the time it was opened, they can't go away until the snapshot view of IndexReader is closed. In general, it's not recommeneded to optimize indexes since 2.9 unless you are doing things like delete all documents. Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Closing indexsearcher , making sur eit is in use
Use something with reference counting - Lucene in action second edition has a searcher manager class which I think might be available standalone. Or a couple of low-tech alternatives: instead of closing the old searcher, move it out of the way and keep a reference to it and close it after n seconds or searches or whatever. Or catch the closed Exception and rerun the query with the up to date searcher. -- Ian. On Thu, Jan 13, 2011 at 8:21 PM, Paul Taylor paul_t...@fastmail.fm wrote: As recommended, I use just one Index Searcher on my multithreaded GUI app using a singleton pattern If data is modified in the index I then close the reader and searcher, and they will be recreate on next call to getInstance() but Ive hit a problem whereby one thread was closing a searcher, another thread already the searcher open but when came to use it gave exception 'the IndexReader is closed' I obviously dont want access to the searcher to be synchronized as it is designed to work multithreaded, so how should I close it safetly, i.e close if no current references to it. Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Closing indexsearcher , making sur eit is in use
You can use ReadWriteLock http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.htmlas low level technique to manage access. A ReadWriteLock maintains a pair of associated lockshttp://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Lock.html, one for read-only operations and one for writing. The read lockhttp://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html#readLock%28%29may be held simultaneously by multiple reader threads, so long as there are no writers. The write lockhttp://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/ReadWriteLock.html#writeLock%28%29is exclusive. Wrap the lucene's searcher into your SearchManager class, which exposes its own API for search and forwards the requests to underlying searcher. The search and reopen will sync up by using ReadWriteLock . search takes readlock and reopen takes writelock. PS: 1. Use indexreader.reopen() instead of closing it off and then open again. It is much faster. (Documented) Thanks Regards Umesh Prasad On Fri, Jan 14, 2011 at 2:25 AM, Ian Lea ian@gmail.com wrote: Use something with reference counting - Lucene in action second edition has a searcher manager class which I think might be available standalone. Or a couple of low-tech alternatives: instead of closing the old searcher, move it out of the way and keep a reference to it and close it after n seconds or searches or whatever. Or catch the closed Exception and rerun the query with the up to date searcher. -- Ian. On Thu, Jan 13, 2011 at 8:21 PM, Paul Taylor paul_t...@fastmail.fm wrote: As recommended, I use just one Index Searcher on my multithreaded GUI app using a singleton pattern If data is modified in the index I then close the reader and searcher, and they will be recreate on next call to getInstance() but Ive hit a problem whereby one thread was closing a searcher, another thread already the searcher open but when came to use it gave exception 'the IndexReader is closed' I obviously dont want access to the searcher to be synchronized as it is designed to work multithreaded, so how should I close it safetly, i.e close if no current references to it. Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Can not delete index file after close the IndexSearcher
Dear Luceners, I'm using lucene-3.0.2 in our app. There is some testing code for switching index, however, when my code run a couple of times, I found the index file was locked, I can not delete the old index files. The code looks like: public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private Directory dir = null; private IndexSearcher searcher; public void search() { while(true) { try { String keyword = test; String fieldName = searchfield; if(dir == null) { dir = FSDirectory.open(new File(indexPath)); } searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if(hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; dir = FSDirectory.open(new File(getIndexPath())); IndexSearcher newSearcher = new IndexSearcher(dir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if(br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if(indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void main(String[] args) { SearchTest searchTest = new SearchTest(); searchTest.search(); } } Can anybody take a look at the above code snippet? I want to search on the different index file every time so I created two different folders and switch them time to time. The index files in the index1/index2 maybe replaced before the search request comes. The problem I found is after I ran the above code 2 or more loops, I can not modify/delete the cfs/cfx file in the file system(Windows 2003), although I closed the searcher every time in the code. It seems that the index file is not released. Is the problem caused by the shared reference of searcher? or some shared thread in the lucene? Thanks in advance! Garry
Re: Can not delete index file after close the IndexSearcher
When you break out of the loop (user enters 'Q') you don't close the current searcher. Could that be it? Also you are calling FSDir.open each time but should only do it once (though this should be harmless). Mike On Wed, Jan 12, 2011 at 5:39 AM, 张志田 zhitian.zh...@dianping.com wrote: Dear Luceners, I'm using lucene-3.0.2 in our app. There is some testing code for switching index, however, when my code run a couple of times, I found the index file was locked, I can not delete the old index files. The code looks like: public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private Directory dir = null; private IndexSearcher searcher; public void search() { while(true) { try { String keyword = test; String fieldName = searchfield; if(dir == null) { dir = FSDirectory.open(new File(indexPath)); } searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if(hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; dir = FSDirectory.open(new File(getIndexPath())); IndexSearcher newSearcher = new IndexSearcher(dir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if(br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if(indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void main(String[] args) { SearchTest searchTest = new SearchTest(); searchTest.search(); } } Can anybody take a look at the above code snippet? I want to search on the different index file every time so I created two different folders and switch them time to time. The index files in the index1/index2 maybe replaced before the search request comes. The problem I found is after I ran the above code 2 or more loops, I can not modify/delete the cfs/cfx file in the file system(Windows 2003), although I closed the searcher every time in the code. It seems that the index file is not released. Is the problem caused by the shared reference of searcher? or some shared thread in the lucene? Thanks in advance! Garry
Re: Can not delete index file after close the IndexSearcher
Mike, thanks for your feedback. I verified this in the debug mode, so I just check the folder I closed in the last loop. Actually, both two folders are locked. tried with new FSDirectory every loop, no help. Garry 2011/1/12 Michael McCandless luc...@mikemccandless.com When you break out of the loop (user enters 'Q') you don't close the current searcher. Could that be it? Also you are calling FSDir.open each time but should only do it once (though this should be harmless). Mike On Wed, Jan 12, 2011 at 5:39 AM, 张志田 zhitian.zh...@dianping.com wrote: Dear Luceners, I'm using lucene-3.0.2 in our app. There is some testing code for switching index, however, when my code run a couple of times, I found the index file was locked, I can not delete the old index files. The code looks like: public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private Directory dir = null; private IndexSearcher searcher; public void search() { while(true) { try { String keyword = test; String fieldName = searchfield; if(dir == null) { dir = FSDirectory.open(new File(indexPath)); } searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if(hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; dir = FSDirectory.open(new File(getIndexPath())); IndexSearcher newSearcher = new IndexSearcher(dir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if(br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if(indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void main(String[] args) { SearchTest searchTest = new SearchTest(); searchTest.search(); } } Can anybody take a look at the above code snippet? I want to search on the different index file every time so I created two different folders and switch them time to time. The index files in the index1/index2 maybe replaced before the search request comes. The problem I found is after I ran the above code 2 or more loops, I can not modify/delete the cfs/cfx file in the file system(Windows 2003), although I closed the searcher every time in the code. It seems that the index file is not released. Is the problem caused by the shared reference of searcher? or some shared thread in the lucene? Thanks in advance! Garry -- 张志田 大众点评网 - 技术部 电话:52521070 - 1675
Re: Can not delete index file after close the IndexSearcher
Hmmm. When you say locked what actually does that mean? Can you post the exception? Also, can you whittle down your example even more? EG if calling this method twice causes the problem, make a method that calls it twice and hits the exception and then start simplifying from there... Mike 2011/1/12 张志田 zhitian.zh...@dianping.com: Mike, thanks for your feedback. I verified this in the debug mode, so I just check the folder I closed in the last loop. Actually, both two folders are locked. tried with new FSDirectory every loop, no help. Garry 2011/1/12 Michael McCandless luc...@mikemccandless.com When you break out of the loop (user enters 'Q') you don't close the current searcher. Could that be it? Also you are calling FSDir.open each time but should only do it once (though this should be harmless). Mike On Wed, Jan 12, 2011 at 5:39 AM, 张志田 zhitian.zh...@dianping.com wrote: Dear Luceners, I'm using lucene-3.0.2 in our app. There is some testing code for switching index, however, when my code run a couple of times, I found the index file was locked, I can not delete the old index files. The code looks like: public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private Directory dir = null; private IndexSearcher searcher; public void search() { while(true) { try { String keyword = test; String fieldName = searchfield; if(dir == null) { dir = FSDirectory.open(new File(indexPath)); } searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if(hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; dir = FSDirectory.open(new File(getIndexPath())); IndexSearcher newSearcher = new IndexSearcher(dir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if(br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if(indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void main(String[] args) { SearchTest searchTest = new SearchTest(); searchTest.search(); } } Can anybody take a look at the above code snippet? I want to search on the different index file every time so I created two different folders and switch them time to time. The index files in the index1/index2 maybe replaced before the search request comes. The problem I found is after I ran the above code 2 or more loops, I can not modify/delete the cfs/cfx file in the file system(Windows 2003), although I closed the searcher every time in the code. It seems that the index file is not released. Is the problem caused by the shared reference of searcher? or some shared thread in the lucene? Thanks in advance! Garry -- 张志田 大众点评网 - 技术部 电话:52521070 - 1675 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Can not delete index file after close the IndexSearcher
Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath()); } } // Copy index File from another folder to this folder copyDir(new File(backupIndexpath), newFile); Directory newDir = FSDirectory.open(newFile); IndexSearcher newSearcher = new IndexSearcher(newDir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if (br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if (indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void copyDir(File sourceLocation, File targetLocation) throws IOException { String[] children = sourceLocation.list(); for (int i = 0; i children.length; i++) { InputStream in = null; OutputStream out = null; try { in = new FileInputStream(new File(sourceLocation, children[i])); out = new FileOutputStream(new File(targetLocation, children[i])); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) 0) { out.write(buf, 0, len); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } finally { try { if (in != null) { in.close(); } if (out != null) { out.close(); } } catch (IOException e) { e.printStackTrace(); } } } } public static void main(String[] args) { SearchTest
Re: Can not delete index file after close the IndexSearcher
Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath()); } } // Copy index File from another folder to this folder copyDir(new File(backupIndexpath), newFile); Directory newDir = FSDirectory.open(newFile); IndexSearcher newSearcher = new IndexSearcher(newDir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if (br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if (indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void copyDir(File sourceLocation, File targetLocation) throws IOException { String[] children = sourceLocation.list(); for (int i = 0; i children.length; i++) { InputStream in = null; OutputStream out = null; try { in = new FileInputStream(new File(sourceLocation, children[i])); out = new FileOutputStream(new File(targetLocation, children[i])); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) 0) { out.write(buf, 0, len); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException ioe) { ioe.printStackTrace(); } finally { try { if (in != null) { in.close
Re: Can not delete index file after close the IndexSearcher
Hi Yuhan, dir.close() can not solve the problem. The reason I have to close the old searcher is my program will replace the old index, the code posted here is just a scenario to simplify my question. Thanks, Garry 在 2011年1月13日 上午10:45,Yuhan Zhang yzh...@onescreen.com写道: Hi Garry, I am guessing the directory needs to be closed before opening a new one. dir.close(); dir = FSDirectory.open(new File(getIndexPath())); why not to open two IndexSearcher objects in an array of two instead of swapping them back and forth? it would be a lot easier. yuhan 2011/1/12 张志田 zhitian.zh...@dianping.com Hi Mike, Sorry to make you confused. lock means the file handle is held by some other progress, the program can not delete it. There is no exception, I can see file.delete() method returns false. If I delete the cfs file in the OS manually, the warning is File was using by another person or program To simplify my question, I made some more code for testing. you can run it for reproducing, after two loops, you will see the message e.g. Can not delete file: D:\index\index2\_0.cfs Thank you very much public class SearchTest { private static final int MAX_RESULT = 1; private String indexPath1 = D:\\index\\index1; private String indexPath2 = D:\\index\\index2; private String backupIndexpath = D:\\index\\index3; private String indexPath = indexPath1; private Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); private IndexSearcher searcher; public void search() { while (true) { try { String keyword = test; String fieldName = searchfield; Directory dir = FSDirectory.open(new File(indexPath)); searcher = new IndexSearcher(dir); QueryParser queryParse = new QueryParser(Version.LUCENE_30, fieldName, analyzer); Query query = queryParse.parse(keyword); TopDocs hits = searcher.search(query, MAX_RESULT); int size = 5; if (hits.scoreDocs.length size) { size = hits.scoreDocs.length; } for (int i = 0; i size; i++) { Document doc = searcher.doc(hits.scoreDocs[i].doc); String text = doc.get(fieldName); System.out.println(fieldContent is: + text); } IndexSearcher oldSearcher = searcher; File newFile = new File(getIndexPath()); for (File file : newFile.listFiles()) { if (!file.delete()) { System.out.println(Can not delete file: + file.getAbsolutePath()); } } // Copy index File from another folder to this folder copyDir(new File(backupIndexpath), newFile); Directory newDir = FSDirectory.open(newFile); IndexSearcher newSearcher = new IndexSearcher(newDir); searcher = newSearcher; oldSearcher.close(); System.out.println(Closed Searcher: + oldSearcher.getIndexReader().directory().toString()); System.out.println(input 'Q' to quit testing...); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); if (br.readLine().trim().equals(Q)) { break; } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (ParseException e) { e.printStackTrace(); } } } private String getIndexPath() { if (indexPath.equals(indexPath1)) { indexPath = indexPath2; } else { indexPath = indexPath1; } return indexPath; } public static void copyDir(File sourceLocation, File targetLocation) throws IOException { String[] children = sourceLocation.list(); for (int i = 0; i children.length; i++) { InputStream in = null; OutputStream out = null; try { in = new FileInputStream(new File(sourceLocation, children[i])); out = new FileOutputStream(new File(targetLocation, children[i])); byte[] buf = new byte[1024]; int len; while ((len = in.read(buf)) 0) { out.write(buf, 0, len
Weird document equals and hash through IndexReader IndexSearcher
Hi, I have a weird result: If I access the same document through the IndexReader or IndexSearcher, they are not equal and have different hash values: Document doc1 = indexSearcher.doc(i); Document doc2 = indexSearcher.getIndexReader().document(i); System.out.println(Equal: + doc1.equals(doc2) + , Hash: + doc1.hashCode() + , + doc2.hashCode() + , num: + i); I'm using Lucene 3.0.2 (No multithreads, nobody is simultaneously updating the index) What am I missing? Thanks Carmit (Could you please forward your answers to my private address as well?)
RE: Weird document equals and hash through IndexReader IndexSearcher
Hi Carmit, equals and hashCode is not implemented for oal.document.Document, so two instances always compare not to each other. The same happens if you retrieve the document two times from same IndexReader. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Carmit Sahar [mailto:carmi...@gmail.com] Sent: Thursday, November 04, 2010 9:27 AM To: java-user@lucene.apache.org Subject: Weird document equals and hash through IndexReader IndexSearcher Hi, I have a weird result: If I access the same document through the IndexReader or IndexSearcher, they are not equal and have different hash values: Document doc1 = indexSearcher.doc(i); Document doc2 = indexSearcher.getIndexReader().document(i); System.out.println(Equal: + doc1.equals(doc2) + , Hash: + doc1.hashCode() + , + doc2.hashCode() + , num: + i); I'm using Lucene 3.0.2 (No multithreads, nobody is simultaneously updating the index) What am I missing? Thanks Carmit (Could you please forward your answers to my private address as well?) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Weird document equals and hash through IndexReader IndexSearcher
Thanks, Uwe! Indeed you're right! Whenever IndexReader is called, a new document instance is created! And since the Document class does no override equals hashCode, I can't know if the same doc was retrieved. And since Document is final, I can only write a wrapper for it. Is this an oversight or intentional? In any case, it's not too convenient... Carmit Hi Carmit, equals and hashCode is not implemented for oal.document.Document, so two instances always compare not to each other. The same happens if you retrieve the document two times from same IndexReader. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremenhttp://www.thetaphi.de http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwww.thetaphi.deu...@thetaphi.de eMail: -Original Message- From: Carmit Sahar [mailto:carmi...@gmail.com] Sent: Thursday, November 04, 2010 9:27 AM To: java-user@lucene.apache.org Subject: Weird document equals and hash through IndexReader IndexSearcher Hi, I have a weird result: If I access the same document through the IndexReader or IndexSearcher, they are not equal and have different hash values: Document doc1 = indexSearcher.doc(i); Document doc2 = indexSearcher.getIndexReader().document(i); System.out.println(Equal: + doc1.equals(doc2) + , Hash: + doc1.hashCode() + , + doc2.hashCode() + , num: + i); I'm using Lucene 3.0.2 (No multithreads, nobody is simultaneously updating the index) What am I missing? Thanks Carmit (Could you please forward your answers to my private address as well?)
Re: Does a IndexSearcher call incRef on the underlying reader?
On Wed, Oct 27, 2010 at 1:01 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: 1st of all, great book. Thank you! @Question3: It sounds like an IndexReader always starts with a count of zero but that should not be a cause of worry because the value only gets acted upon in a call to decRef() ... am I right? Actually, refCount of a new IndexReader starts at 1. Then the caller must call close (which under the hood calls decRef) to drop it to 0. @Question4: It seems to me that based on you explanation so far, the IndexReader will end up closing after the very 1st search. That doesn't sound too efficient given that keeping it alive and kicking is something that is highly desirable ... no? Am I missing something or does that responsibility fall elsewhere? Actually, no -- SearcherManager also holds a ref. So when there are no queries in flight, the refCount will be 1. It's only when the searcher is swapped out for a new one that we decRef the old one and its refCount drops to 0 (once all in-flight queries finish). I hope I haven't hijacked my own thread? I don't think so! Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Blocking on IndexSearcher search
Uwe Schindler wrote: Im using Windows and I'll try NIO, good idea, my app is already memory hungry in other areas so I guess MMapped is a no go, doe sit use heap or perm memory ? It uses address space for mapping the files into virtual memory (like a swap file) - this is why it only works well for 64bit VMs. The used physical memory depends on your OS cache configuration. Java Heap is not used for that (in contrast to copying a file to a RAMDirectory). Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Went to try NIO and then realised it no better for Windows, in fact the SUN bug seems is saying that multiple file channels work better than sharing one, so perhaps that it what is happening when I had mutliple IndexReaders and so performance was actually better on Windows in that circumstance. As customers could be using 32bit or 64bit I fear that MMapped is not a very robust solution , oh well have to live with fit or now I suppose Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Blocking on IndexSearcher search
Hi My multithreaded code was always creating a new IndexSearcher for every search, but I changed over to the recommendation of creating just one index searcher and keeping it between searches. Now I find if I have multiple threads trying to search they block on the search method(), only one can search at any time, is this expected behaviour ? Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Blocking on IndexSearcher search
Can you show us where it exactly blocks (e.g. use Ctrl-Break on windows to print a thread dump)? IndexSearchers methods are not synchronized and concurrent access is easy possible, all concurrent access is managed by the underlying IndexReader. Maybe you synchronize somewhere in your code? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Wednesday, August 25, 2010 10:26 PM To: java-user@lucene.apache.org Subject: Blocking on IndexSearcher search Hi My multithreaded code was always creating a new IndexSearcher for every search, but I changed over to the recommendation of creating just one index searcher and keeping it between searches. Now I find if I have multiple threads trying to search they block on the search method(), only one can search at any time, is this expected behaviour ? Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Blocking on IndexSearcher search
Uwe Schindler wrote: Can you show us where it exactly blocks (e.g. use Ctrl-Break on windows to print a thread dump)? IndexSearchers methods are not synchronized and concurrent access is easy possible, all concurrent access is managed by the underlying IndexReader. Maybe you synchronize somewhere in your code? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de I'm picking this up using the Yourkit Profiler In the thread view it says: blocked on org.apache.lucene.search.Searcher.search(Query,Filter,int) On the Monitor Profiling page it says: Blocked Thread: was blocked on monitor of class orga.apache.lucene.store.SimpleFSDirectory$SimpleFSindexInput$Descriptor is the file system the problem ? I'm creating the index using Directory directory = FSDirectory.open(new File(INDEX_NAME)); IndexWriter writer = new IndexWriter(directory,analyzer,true, IndexWriter.MaxFieldLength.UNLIMITED); and my Index Searcher is created as IndexSearcher is = new IndexSearcher(directory,true); Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Blocking on IndexSearcher search
Uwe Schindler wrote: That lock contention is fine there as this is the central point where all IO is done. This does not mean that only one query is running in parallel, the queries are still running in parallel. But there is one place where all IO is waiting for one file descriptor. This is not different with multiple IndexSearchers. YourKit simply shows this place as it has most contention. You are using Windows? On Linux it should use NIO automatically (FSDir.open() uses platform specific defaults). You can also improve speed and play with e.g. MMapDirectory on 64 bit platforms. Or try out how NIO works on your platform. Im using Windows and I'll try NIO, good idea, my app is already memory hungry in other areas so I guess MMapped is a no go, doe sit use heap or perm memory ? I understand the lock on I/O point but what was concerning me is in the thread view the threads were blocking for some time, not just a couple of milliseconds. I actually refactored my code to make it multithreaded specifically tfor his bit of code becaue alot of searches was necessary, and the elapsed time is faster than using a single thread but not as fast as Id hoped. Paul - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Blocking on IndexSearcher search
Im using Windows and I'll try NIO, good idea, my app is already memory hungry in other areas so I guess MMapped is a no go, doe sit use heap or perm memory ? It uses address space for mapping the files into virtual memory (like a swap file) - this is why it only works well for 64bit VMs. The used physical memory depends on your OS cache configuration. Java Heap is not used for that (in contrast to copying a file to a RAMDirectory). Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
How to close IndexSearcher so that index file gets released?
Hi, I want to be able to regenerate index from time to time. I'm using IndexSearcher for search and want to be able to release the current index file so that I can replace it with the new one. But once IndexSearcher is instantiated it does not seem to release index file even if I call close(). I'm running the test on Windows XP. Here is a short test that I use: String indexDir = C:/IndexTemp2/index/; IndexSearcher searcher = new IndexSearcher(new MMapDirectory(new File(indexDir))); searcher.close(); /* Trying to see if the index file can be modified */ new FileWriter(indexDir + _0.cfs); /* java.io.FileNotFoundException: C:\IndexTemp2\index\_0.cfs (The requested operation cannot be performed on a file with a user-mapped section open.) */ After I close IndexSearcher I try to check if I can modify file, but it is in use. Could someone tell me what is the correct way to close the IndexReader? I will try to attach the JUnit test class and index directory as ZIP archive to this message. Thanks, Sergey - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to close IndexSearcher so that index file gets released?
Read the javadocs for MMapDirectory. -- Ian. On Mon, Aug 16, 2010 at 2:21 PM, Mylnikov Sergey semy...@yandex.ru wrote: Hi, I want to be able to regenerate index from time to time. I'm using IndexSearcher for search and want to be able to release the current index file so that I can replace it with the new one. But once IndexSearcher is instantiated it does not seem to release index file even if I call close(). I'm running the test on Windows XP. Here is a short test that I use: String indexDir = C:/IndexTemp2/index/; IndexSearcher searcher = new IndexSearcher(new MMapDirectory(new File(indexDir))); searcher.close(); /* Trying to see if the index file can be modified */ new FileWriter(indexDir + _0.cfs); /* java.io.FileNotFoundException: C:\IndexTemp2\index\_0.cfs (The requested operation cannot be performed on a file with a user-mapped section open.) */ After I close IndexSearcher I try to check if I can modify file, but it is in use. Could someone tell me what is the correct way to close the IndexReader? I will try to attach the JUnit test class and index directory as ZIP archive to this message. Thanks, Sergey - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Re: How to close IndexSearcher so that index file gets released?
Thanks, Ian Somehow I did not bother to read the MMapDirectory javadoc. 16.08.10, 17:27, Ian Lea ian@gmail.com: Read the javadocs for MMapDirectory. -- Ian. On Mon, Aug 16, 2010 at 2:21 PM, Mylnikov Sergey wrote: Hi, I want to be able to regenerate index from time to time. I'm using IndexSearcher for search and want to be able to release the current index file so that I can replace it with the new one. But once IndexSearcher is instantiated it does not seem to release index file even if I call close(). I'm running the test on Windows XP. Here is a short test that I use: String indexDir = C:/IndexTemp2/index/; IndexSearcher searcher = new IndexSearcher(new MMapDirectory(new File(indexDir))); searcher.close(); /* Trying to see if the index file can be modified */ new FileWriter(indexDir + _0.cfs); /* java.io.FileNotFoundException: C:\IndexTemp2\index\_0.cfs (The requested operation cannot be performed on a file with a user-mapped section open.) */ After I close IndexSearcher I try to check if I can modify file, but it is in use. Could someone tell me what is the correct way to close the IndexReader? I will try to attach the JUnit test class and index directory as ZIP archive to this message. Thanks, Sergey - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexSearcher - open file handles by deleted files
Just closing IndexSearcher should be enough. Are you really sure you're closing all IndexSearchers you've opened? Hmm the code looks somewhat dangerous. Why sleep for 10 seconds before closing? Is this to ensure any in-flight queries finish? It's better to explicitly track this (eg w/ IndexReader's incRef/decRef). What if determineIndexDirectories is called again before 10 seconds have passed? Mike On Wed, May 26, 2010 at 9:44 AM, Thomas Rewig tre...@mufin.com wrote: Hello, I use Lucene 2.9.1 with two indices, which alternate each day. One is live, the other is erased and renewed with the latest data. The problem is that the index files will be deleted, but the file handles are still available. If the program (JBOSS) is not restarted for some time, the disk space is scarce. With lsof I see e.g: java 6054 root 80r REG 8,1 5939406525 84663 /usr/_index/2/item3_index/_2fdtq2.cfs (deleted) java 6054 root 82r REG 8,1 401785779 78344 /usr/_index/2/item2_index/_5exkf.cfs (deleted) java 6054 root 84r REG 8,1 106496943 72217 /usr/_index/2/item1_index/_85bld.cfs (deleted) java 6054 root 147r REG 8,1 5939406525 84663 /usr/_index/2/item3_index/_2fdtq2.cfs (deleted) java 6054 root 150r REG 8,1 401785779 78344 /usr/_index/2/item2_index/_5exkf.cfs (deleted) ### open the a specific searcher: ### public static Searcher getSearcher(String indexName) throws IOException { Searcher searcher = searchersList.get(indexName); if(searcher == null) { String path = getPath(subDir, indexName); Directory directory = new NIOFSDirectory(new File(path)); searcher = new IndexSearcher(directory, true); directoriesList.put(indexName, directory); searchersList.put(indexName, searcher); } return searcher; } ### switch the searchers: ### public static void determineIndexDirectories() { searchersListOld = searchersList; searchersList = new HashtableString, Searcher(); directoriesListOld = directoriesList; directoriesList = new HashtableString, Directory(); subDir = getLastIndexDir(); closeOldSearchers(); } ### close the searchers: ### private static void closeOldSearchers() { new Thread() { public void run() { try { sleep(10*1000); } catch (InterruptedException e) { logger.error(IndexManager.closeOldSearchers, e); } for(Searcher searcher : searchersListOld.values()) { try { searcher.close(); } catch (IOException e) { logger.error(Error closing Searcher., e); } } searchersListOld.clear(); searchersListOld = null; for(Directory directory : directoriesListOld.values()) { try { directory.close(); } catch (IOException e) { logger.error(Error closing Directory., e); } } directoriesListOld.clear(); directoriesListOld = null; } }.start(); } I search for this problem in the mailing list, and there are similar Problems with a not correctly closed IndexReader. If I create a IndexSearcher with a directory could it be that there is a similar problem, e.g. the under lying IndexReader (if there is one) closes not automatically if i close searcher.close()? Do I have to close something else, than all IndexSearchers and Directorys? Or am I wrong with my assumption, and the problem is somewhere else? Best Thomas - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org