Thanks for responding Ype.
> -----Original Message-----
> From: Ype Kingma [mailto:[EMAIL PROTECTED]
> Sent: Sunday, November 23, 2003 2:03 PM
> To: Lucene Users List
> Subject: Re: Dates and others
>
> On Saturday 22 November 2003 18:33, Dion Almaer wrote:
> ...
> >
> > 1. The power of dates:
> >
> > I am fairly happy with the results of queries on my index. The
> > only issue I have is that at the moment the date of the
> content isn't
> > considered (since lucene doesn't know about it). Is there
> a good way
> > in which the date of the content could be used to help with the
> > scoring? So more recent content shows up higher in the
> stack. I have
> > a date keyword field, but it isn't part of the query
> itself. Are there any patterns to help with this?
>
> You can use the Lucene date field, or use a keyword field eg.
> in yyyymmdd format. However, Lucene's scoring is not based on
> the value of a matching term, it's based on term frequencies
> in documents, on the number of documents in the index
> containing the term, and on the distance between terms (for
> proximity queries.) You cannot make the document score depend
> directly on the value of a (date) field in the document.
> Btw, how big would you want the date influence to be in the score?
>
> Sorting results by date has been discussed in the past, see
> the archives.
> You lose the document scores in this case.
Yeah this is tough. I don't want to sort by date as then something that was a really
low score but
was recent would show up at the top.
I think I will stick with giving the user the ability to say "between Jan 1 2003 and
..." instead.
This leads me to another issue actually. On certain range queries I get exceptions:
Query: modifieddate:[1/1/03 TO 12/31/03]
org.apache.lucene.search.BooleanQuery$TooManyClauses
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:109)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:101)
at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:137)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:188)
at org.apache.lucene.search.Query.weight(Query.java:120)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:128)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:93)
at org.apache.lucene.search.Hits.<init>(Hits.java:80)
at org.apache.lucene.search.Searcher.search(Searcher.java:71)
at org.apache.lucene.search.Searcher.search(Searcher.java:65)
at com.portal.util.search.IndexSearch.search(Unknown Source)
at com.portal.util.search.IndexSearch.main(Unknown Source)
Has anyone run into this problem?
> > 2. +field:foo and the QueryParser:
> >
> > I ran into some problems where using +field:foo was
> giving strange
> > results. When I changed the queries to "... AND field:foo"
> everything
> > was fine.
> > Am I missing something there?
>
> Which version of Lucene are you using? There have been some
> fixes in the query parser of Lucene 1.2, but I don't know
> precisely which.
I am using 1.3 RC 2. The AND workaround is fine... just caught me
by surprise.
> > 3. I have some fields suck as title, owner, etc as well as
> the content
> > blob which I index and use as the default search field. Is
> there an
> > easy way to extend the QueryParser to merge it with a
> MultiTermQuery
> > which can also search this meta data and give them certain
> weights?
> > Or, if you go down
>
> You can provide field weights at document indexing time
> (norms) and use a MultiTermQuery for searching multiple
> fields. At query time you can again use field weights.
> I don't know how the scoring of the MultiTermQuery is done,
> it might use the max. score over the fields of a document, or
> combine the scores in the fields of a document.
Yeah I will play with this. For now adding the title to the main body does seem to
work pretty
well,
so it may be good enough!
> > this path do you have to leave the QueryParser behind and
> build your
> > own queries? Any best practices would be great.
>
> You have some options:
> - create the MultiTermQuery from the query text, or
> - index the default search field as a single field, eg. by
> concatenation, and evt. by inserting empty tokens in between
> to avoid proximity matches.
> This has also been discussed recently, see eg. the discussion
> on indexing of sentences.
>
> Searching mutliple fields is normally a little slower than
> searching a concatenated field. The actual difference depends
> on you data, so you might experiment a bit. You might eg.
> index all fields seperately, and also index a default
> concatenated field.
>
> Kind regards,
> Ype Kingma
Thanks a lot for the great ideas.
Dion
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]