Re: Boosting results

2008-11-07 Thread Erick Erickson
dh, sorting. I absolutely love it when I overlook the obvious G. [EMAIL PROTECTED] On Fri, Nov 7, 2008 at 4:58 AM, Michael McCandless [EMAIL PROTECTED] wrote: Couldn't you just do a single Query that sorts first by category and second by relevance? Mike Erick Erickson wrote: It

Re: Boosting results

2008-11-07 Thread Michael McCandless
Couldn't you just do a single Query that sorts first by category and second by relevance? Mike Erick Erickson wrote: It seems to me that the easiest thing would be to fire two queries and then just concatenate the results category:A AND body:fred category:B AND body:fred If you

Re: Boosting results

2008-11-07 Thread Matthew DeLoria
This actually brings up an interesting question, and something I have been curious about. In this case, does it make more sense to do Boosting by Category, or to do sorting? From what I understand, Lucene sorting involves putting the relevant fields into memory, and then executing a sort. Is

RE: Boosting results

2008-11-07 Thread Scott Smith
Well, it's not like sorting hadn't occurred to me. Unfortunately, what I recalled was that you could only sort results on one field (I do date sorted searches all the time in my application). I should have gone back and looked. My memory failed me as I can see that you can sort on multiple

Re: Boosting results

2008-11-07 Thread Michael McCandless
This is a good point. Sorting populates the field cache (internal to Lucene) for that field, meaning it loads all values for all docs and holds them in memory. This makes the first query slow, and, consumes RAM, in proportion to how large your index is. Whereas boosting should be able

term offsets wrong depending on analyzer

2008-11-07 Thread Christian Reuschling
Hi Guys, I currently have a bug of wrong term offset values for fields analyzed with KeywordAnalyzer (and also unanalyzed fields, whereby I assume that the code may be the same) The offset of a field seems to be incremented by the entry length of the previously analyzed field. I had a look into

Re: Boosting results

2008-11-07 Thread Peter Keegan
If you sort first by score, keep in mind that the raw scores are very precise and you could see many unique values in the result set. The secondary sort field would only be used to break equal scores. We had to use a custom comparator to 'smooth out' the scores to allow the second field to take

Re: BoostingTermQuery scoring

2008-11-07 Thread Peter Keegan
boost:(+petroleum +engineer +refinery) (+contents:(+petroleum +engineer +refinery) +((*:* -boost:petroleum) (*:* -boost:engineer) (*:* -boost:refinery))) That's an interesting solution. Would this result in many more documents being visited by the scorer, possibly impacting

Re: term offsets wrong depending on analyzer

2008-11-07 Thread Michael McCandless
Thanks for raising these! For the 1st issue (KeywordTokenizer fails to set start/end offset on its token), I think we add your two lines to fix it. I'll open an issue for this. The 2nd issue (if same field name has more than one NOT_ANALYZED instance in a doc then the offsets are double

Term numbering and range filtering

2008-11-07 Thread Tim Sturge
Hi, I¹m wondering if there is any easy technique to number the terms in an index (By number I mean map a sequence of terms to a contiguous range of integers and map terms to these numbers efficiently) Looking at the Term class and the .tis/.tii index format it appears that the terms are stored

RE: searchable archives

2008-11-07 Thread Dragon Fly
http://www.gossamer-threads.com/lists/lucene/java-user/ Date: Fri, 7 Nov 2008 14:27:38 -0700 From: [EMAIL PROTECTED] To: java-user@lucene.apache.org Subject: searchable archives Hey, Is this list available somewhere that you can search the entire archives at one time? Thanks, Chad

searchable archives

2008-11-07 Thread ChadDavis
Hey, Is this list available somewhere that you can search the entire archives at one time? Thanks, Chad

Store versus Index

2008-11-07 Thread ChadDavis
I just need a little confirmation of my understanding here. If i say that a field is to be stored, the entire thing is written to the index. It might also be indexed in a tokenized fasion if i also specify that. What are the advantages to storing a field then? So you can search for that field?

changes

2008-11-07 Thread ChadDavis
I'm upgrading from a very old version of lucene to 2.4 I tried to research all the possible changes, this included reading the change file from the 2.4 version, which appears to reach back through all of the versions. However, I'm finding major API changes that aren't documented in that file.

Re: searchable archives

2008-11-07 Thread Mark Miller
Or nabble or markmail - Mark On Nov 7, 2008, at 3:33 PM, Dragon Fly [EMAIL PROTECTED] wrote: http://www.gossamer-threads.com/lists/lucene/java-user/ Date: Fri, 7 Nov 2008 14:27:38 -0700 From: [EMAIL PROTECTED] To: java-user@lucene.apache.org Subject: searchable archives Hey, Is this

Re: Store versus Index

2008-11-07 Thread Yonik Seeley
On Fri, Nov 7, 2008 at 4:36 PM, ChadDavis [EMAIL PROTECTED] wrote: I just need a little confirmation of my understanding here. If i say that a field is to be stored, the entire thing is written to the index. It might also be indexed in a tokenized fasion if i also specify that. Right.

Lucene and JSP

2008-11-07 Thread Rafael Cunha de Almeida
Hello, I'm writing my first JSP application, so this may be too much of a newbie question, in which case I hope you can refer me to documentation which can help me out. How do I keep only one IndexSearcher open for all the searches on my website?

Strange behaviour of FrenchAnalyzer when using accents

2008-11-07 Thread lamino
Greetings, I'm getting a strange behaviour when using the FrenchAnalyzer. Calling the same class (Searcher.java, see below) from a JSP file and from a Java class, gives differents results when the query contains accents ! Notice the different value of the query object : q = secrétaire If