Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ian Lea
Lucene lets you sort by multiple fields, including score. See the javadocs for Sort and SortField, specifically SortField.SCORE. -- Ian. On Wed, Dec 17, 2008 at 8:15 PM, Ariel isaacr...@gmail.com wrote: Hi: This solution have a problem. the results are sorted bye the year criteria but I

Re: Cache Used by IndexReader/IndexSearcher

2008-12-18 Thread Ian Lea
Hi Are all the queries broadly similar or are the later ones more complex? What happens if you switch the order and run the later queries first? Any complications like sorting? Has your jvm got enough memory? There is no IndexSearcher cache that you can increase. -- Ian. On Wed, Dec 17,

Re: IndexReader delete

2008-12-18 Thread Ganesh
I am planning to keep indexing and searching in a single process and expose the search functionality as a service. In any case, i want the deletion to be done by reader, so that it could be reflected immediately in search. If it is done by writer, then i need to commit the changes, reopen the

Re: Persian (Farsi) Language Analyzer

2008-12-18 Thread Grant Ingersoll
I don't know of any. I'd google for Persian Lucene or Farsi Lucene. When I did that, I did see some researchers who did some experiments w/ Lucene and Persian. On Dec 17, 2008, at 8:12 AM, Ian Vink wrote: I have ported the Java version of the Arabic analyzer recently committed to

Re: Order of fields returned by Document.getFields()

2008-12-18 Thread Grant Ingersoll
On Dec 17, 2008, at 11:56 AM, Yonik Seeley wrote: On Wed, Dec 17, 2008 at 10:32 AM, Patrick Johnstone pjohnst...@dejavunet.net wrote: As I said in the original email, my issue is that I don't think Lucene is returning the fields in the original order anymore. Hmmm, you're right.

RE: double metaphone for misspellings

2008-12-18 Thread Max Metral
Somehow I seem to have missed (and can't find) your original mail, but it seems like you're asking about using double metaphone for place names. We've done this on our site (http://boston.povo.com) for street and place names, and I can't say we've been happy with the results. We're toying with

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ariel
What I am doing is this: code Sort sort = new Sort(); sort.setSort(year, true); hits = searcher.search(pquery,sort); /code How I must put my code to sort first by date an then by score ??? Greetings Ariel On Thu, Dec 18, 2008 at 4:48 AM, Ian Lea

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
You will be stunned at how easy it is. The merging code should be a dozen lines (and that only if you are merging 6 or so indexes) See IndexWriter.addIndexes or IndexWriter.addIndexesNoOptimize Best Erick On Thu, Dec 18, 2008 at 5:03 AM, Preetham Kajekar preet...@cisco.comwrote: Hi, I

Re: How to search documents taking in account the dates ???

2008-12-18 Thread John Byrne
Hi, I think this should do it... code SortField dateSortField = new SortField(year, false);//the second argument reverses the sort direction if set to true SortField scoreSortField= new SortField(null, SortField.SCORE, false); // value of null for field, since 'score' is not

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Erick Erickson
Use the setSort that takes an array of Sort objects... On Thu, Dec 18, 2008 at 8:11 AM, Ariel isaacr...@gmail.com wrote: What I am doing is this: code Sort sort = new Sort(); sort.setSort(year, true); hits = searcher.search(pquery,sort); /code How I must

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ariel
Thank you, it works very good. Regards Ariel On Thu, Dec 18, 2008 at 8:22 AM, Erick Erickson erickerick...@gmail.comwrote: Use the setSort that takes an array of Sort objects... On Thu, Dec 18, 2008 at 8:11 AM, Ariel isaacr...@gmail.com wrote: What I am doing is this: code

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Thanks. Yep the code is very easy. However, it take about 3 mins to complete merging. Looks like I will need to have an out of band merging of indexes once they are closed (planning to store about 50mil entries in each index partition) However, as the data is being indexed, is there any

Re: lucene 2.4 sorting slowness

2008-12-18 Thread Chris Salem
that makes it much faster (100ms after the first run). thanks alot. also, the index will be updated oftenly throughout the day, will keeping the indexreader open recognize updates to the index? Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net -

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Hi, I noticed that the doc id is the same. So, if I have HitCollector, just collect the doc-ids of both Searchers (for the two indexes) and find the intersection between them, it would work. Also, get the doc is even where there are large number of hits is fast. Of course, I am using

Re: RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-18 Thread Paul Elschot
Op Wednesday 17 December 2008 22:49:08 schreef 1world1love: Just an FYI in case anyone runs into something similar. Essentially I had indexes that I have been searching from a java stored procedure in Oracle without issue for awhile. All of a sudden, I started getting the error I alluded to

RE: double metaphone for misspellings

2008-12-18 Thread Geoff Hendrey
I would think that if the place names are English, which those in Boston would be, then they would be reasonable candidates for soundex and double metaphone. I am considering an approach where I store SOUNDEX, refined SOUNDEX, doublemetaphone, and I'll look into ngram as well, and search against

Re: Combining results of multiple indexes

2008-12-18 Thread Michael McCandless
These results are surprising. I'd expect single IndexWriter with 2 threads to do better than a single thread, but in your test two threads are significantly worse than one. Is it possible there's a bottleneck outside of Lucene in sourcing the documents? How many segments are produced

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
I would recommend, very strongly, that you don't rely on the doc IDs being the same in two different indexes. Doc IDs are just incremented by one for each doc added, but. optimization can change the doc ID. and is guaranteed to change at least some of them if there are deletions from your

Approximate release date for Lucene 2.9

2008-12-18 Thread Kay Kay
Hi - I am just curious - what is the approximate release target date that we have for Lucene 2.9 ( currently in beta in dev). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:

Field.omitTF

2008-12-18 Thread John Wang
Hi: In lucene 2.4, when Field.omitTF() is called, payload is disabled as well. Is this intentional? My understanding is payload is independent from the term frequencies. Thanks -John

Re: Field.omitTF

2008-12-18 Thread Mark Miller
Drops positions as well. - Mark On Dec 18, 2008, at 4:57 PM, John Wang john.w...@gmail.com wrote: Hi: In lucene 2.4, when Field.omitTF() is called, payload is disabled as well. Is this intentional? My understanding is payload is independent from the term frequencies. Thanks -John

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Michael McCandless
Well... there are a couple threads on java-dev discussing this now: http://www.nabble.com/2.9-3.0-plan---Java-1.5-td20972994.html http://www.nabble.com/2.9,-3.0-and-deprecation-td20099343.html though they seem to have petered out. Also we have 29 open issues for 2.9:

Re: Field.omitTF

2008-12-18 Thread John Wang
Thanks Mark!I don't think it is documented (at least the ones I've read), should this be considered as a bug or ... ? Thanks -John On Thu, Dec 18, 2008 at 2:05 PM, Mark Miller markrmil...@gmail.com wrote: Drops positions as well. - Mark On Dec 18, 2008, at 4:57 PM, John Wang

Re: Field.omitTF

2008-12-18 Thread Mark Miller
No, not a bug, certainly its the intended behavior (though the name is a bit tricky isn't it? I've actually thought about that in the past myself). If you check out the javadoc on Fieldable youll find: /** Expert: * * If set, omit term freq, positions and payloads from postings for this

optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ok. This is crazy. I have an index with 14,488,449 docs in it. Today I did a CheckIndex on it and everything looked fine. I made a copy of the index, ran a delete on about 1.3 million docs and then did an optimize and now my doc count is 38449. The index was originally built with 2.3, but I am

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Ganesh
Does Lucene 2.9 has real time search? Any improvements in sorting? Any facility to store a payload per document (without updating document)? Please highlight the important feature? Regards Ganesh - Original Message - From: Michael McCandless luc...@mikemccandless.com To:

Re: optimize: went from 14488449 to 38449

2008-12-18 Thread Ganesh
Optimize will remove the deletes and rearrange the document numbers. Have you done some deletes before deleting 1.3 million docs? Regards Ganesh - Original Message - From: 1world1love jd_co...@yahoo.com To: java-user@lucene.apache.org Sent: Friday, December 19, 2008 9:49 AM Subject:

Re: optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ganesh - yahoo wrote: Optimize will remove the deletes and rearrange the document numbers. Have you done some deletes before deleting 1.3 million docs? No, that is the crazy part. I haven't done anything to this index since it was first compiled until I did the deletes. That is why I

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Well look at the issues and see for yourself :) Its a subjective call I think. Heres my take: There are not going to be too many sweeping changes in the next release. There are tons of little bug fixes and improvements, but not a lot of the bullet point type stuff that you mention in your

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Mark Miller wrote: TrieRangeQuery has been added to contrib. Super awesome, super efficient, large scale sorting. Sorry. Its way past my bedtime. Large scale numerical range searching. Sorting on the brain. - To

Re: Field.omitTF

2008-12-18 Thread John Wang
Thanks Mark for the pointer! -John On Thu, Dec 18, 2008 at 6:13 PM, Mark Miller markrmil...@gmail.com wrote: No, not a bug, certainly its the intended behavior (though the name is a bit tricky isn't it? I've actually thought about that in the past myself). If you check out the javadoc on