On 8/1/06, Pedro Côrte-Real <[EMAIL PROTECTED]> wrote: > On Tue, 2006-08-01 at 09:24 +0900, David Balmain wrote: > > How many documents and what is the date range (eg 2001-01-01 -> > > 2006-08-01). These are the critical variables for sort performance. > > Once I know these numbers I'll be able to replicate the task here and > > I'll see what I can do. > > I have around 600_000 documents and the date range is rather large, > something like from year 1000 to now. I don't know for sure but I can > check if it makes a difference. > > But not all my sort fields are dates. I also have regular text fields > that I have now made untokenized (by using separate fields for sorting > and searching). Got to check if that made them faster.
Hmmm. Sounds like an interesting application. One solution would be to cache the sort index on disk. The problem with this is that the cache would still need to be recalculated every time you add more documents to the index so you'll still have the long wait occasionally. I'll look into it anyway at a later stage. Another idea that I can implement now is to add a BYTES sort type which would basically sort by the order the terms appear in the index. Let's say you index dates in the format "YYYYMMDD" and you sort by INTEGER. Everytime you load the sort index you need to go through every single date and convert it from string to integer. But this is unnecessary since the dates are already in order in the index. A BYTES sort type would take advantage of this. You'd get an even bigger benefit for ascii strings. strcoll is used to sort strings but this is unnecessary for ascii strings as they are already correctly ordered in the index. Also, the index needs to keep each string in memory which would also be unnessary. Sorry if this isn't very clear. I'm not sure how much it will help. We'll have to wait and see. Dave _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

