Let's back up a minute. The number of matched records is not important when sorting, what's important is the number of unique terms in the field being sorted. Do you have any figures on that? One very common sorting issue is sorting on very fine date time resolutions, although your examples don't include that...
Now, cache loading is an issue. The very first time you sort on a field, all the values are read into a cache. Is it possible this is the source of your problems? You can cure this with warmup queries. The take-away is that measuring the response time for the first sorted query is very misleading. Although if you're sorting on many, many, many email addresses, that could be "interesting". The comment "returning 10,000,000 documents" is, I hope, a misstatement. If you're trying to *return* 10M docs sorting is irrelevant compared to assembling that many documents. If you're trying to return the first 100 of 10M documents, it should work. Overall, we need more details on what you're sorting and what you're trying to return as well as how you're measuring before we can say much.... Along with how much memory you're giving your JVM to work with, what "exploding" means. Are you CPU bound? IO bound? Swapping? You need to characterize what is going wrong before worrying about solutions...... Best Erick On Mon, Aug 16, 2010 at 3:08 PM, Michel Nadeau <aka...@gmail.com> wrote: > Hi, > > we are building an application using Lucene and we have HUGE data sets (our > index contains millions and millions and millions of documents), which > obviously cause us very important problems when sorting. In fact, we > disabled sorting completely because the servers were just exploding when > trying to sort results in RAM. The users using the system can search for > whatever they want, so we never know how many results will be returned - a > search can return 10 documents (no problem with sorting) or 10,000,000 > (huge > sorting problems). > > I woke up this morning and had a flash : is it possible with Lucene to have > a "natural sorting" of documents? For example, let's say I have 3 columns I > want to be able to sort by : first name, last name, email; I would have 3 > different indexes with the very same data but with a different primary key > for sorting. I know it's far fetched, and I have never seen anything like > that since I use Lucene, but we're just desperate... how people do to have > huge data sets, a lot of users, and sort!? > > Thanks, > > Mike >