Hi Che- The presort method was our first approach but this doesn't work in practice because we update the index incrementally and insertion order doesn't match date ordering as we add updates.
I don't think sorting top hits only will deliver what the user is expecting -- that is, results listed in most-recent-first order. Is there a better way to do it? BTW, we're not seeing a ridiculous performance degredation arising out of sorts on large result sets. But on the other hand, sort doesn't seem to be working very well, so far... Regards, James --- Che Dong <[EMAIL PROTECTED]> wrote: > Just like Google said: full text search service is not traditional > database application. Lucene is not a database too: if you wanna sort on > some fields, you'd better pre-sort it before it indexed: like date. then > get results by doc id. > > For lucene you can only sort results in top hits. if you sort 400k > result hits by date: you lost the speed of Lucene. > > > Thanks > > Che Dong > http://www.chedong.com/ > > Erik Hatcher 写道: > > > > On Apr 21, 2005, at 5:22 PM, James Levine wrote: > > > >> I have an index of around 3 million records, and typical queries > >> can result in result sets of between 1 and 400,000 results. > >> > >> We have indexed "dateTime" fields in the form 20050415142, that is, to > >> 10-minute precision. > >> > >> When I try to sort queries I get something back that is roughly sorted > >> on index, but not quite. Stuff is out of order just a bit. The > >> size of the result set does not seem to be related occurance of > >> this problem. > >> > >> We've tried lucene 1.4-final and1.4.3. > >> > >> my code looks like this > >> > >> s = new Sort( new SortField[] { new SortField( "dateTime", > >> SortField.STRING, > >> true ), SortField.FIELD_SCORE } ); > >> > >> ... > >> > >> hits = searcher.search( qry, s ); > >> > >> > >> Any help is appreciated, I'm so far baffled by this problem. > > > > > > I don't have a solution, but rather some questions to check.... are all > > dateTime's the same width, zero padded on the right? Does every > > document have a dateTime field? > > > > I recommend you sort with type INT instead of STRING if it fits, or > > maybe LONG. STRING will use the most resources for sorting. > > > > Erik > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
