Erik, I'm not using a cached IndexSearcher. Is this an option in an environment where the underlying index changes on a second-by-second basis? At what layer would a cached IndexSearcher be cached? At the tomcat layer?
Caching at the object layer seems like it might help, but it doesn't address my underlying concern. IE: the relative performance difference between natural order and sorting order. Maybe you're right - and I shouldn't be worried about the very first search against the index. How would a cached searcher implementation look? -Dave -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 20, 2007 4:03 PM To: java-user@lucene.apache.org Subject: Re: Sort Performance Question Are you using a cached IndexSearcher such that successive sorts on the same field will be more efficient? Erik On Mar 20, 2007, at 3:39 PM, David Seltzer wrote: > Hi All, > > > > I have a sort performance question: > > > > I have a fairly large index consisting of chunks of full-text > transcriptions of television, radio and other media, and I'm trying to > make it searchable and sortable by date. The search front-end uses a > parallelmultisearcher to search up to three indexes at a time (each > index contains a month of live data). When I search for the word > "toast" > (for example) sorted by score the results come back in about 1200ms, > when I sort it by DateTime the results come back in 3900ms. > > > > Initially I was sorting based on a unixtime field, but having read > up on > it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this > value is still larger than an int, so I went one step farther and > created two more fields for test purposes: SortDate, which is yyyyMMdd > and SortTime which is HHmm. When I sort by SortDate then SortTime the > results come in even slower, around 4300ms. > > > > To summarize: > > > > //The sorting fields looks like this: > > new Field("SortDateTime", sdfDateTime.format(dMySortDateTime), > Field.Store.YES, Field.Index.UN_TOKENIZED); > > new Field("SortDate", sdfDate.format(dMySortDateTime), > Field.Store.YES, > Field.Index.UN_TOKENIZED); > > new Field("SortTime", sdfTime.format(dMySortDateTime), > Field.Store.YES, > Field.Index.UN_TOKENIZED); > > > > //and the performance looks like this: > > > > //sort by score > > Sort sSortOrder = Sort.RELEVANCE; //1200ms > > > > //sort by datetime > > Sort sSortOrder = new Sort("SortDateTime", true); //3900ms > > > > //sort by date then time > > //yes, I know this isn't valid code > > Sort sSortOrder = new Sort({new > SortField("SortDate",SortField.INT,bReverse), new > SortField("SortTime",SortField.INT,bReverse)}); //4300ms > > > > > > The two indexes that are being searched at the moment look like this: > > > > Index 1: > > Index Path: /storage/unisearch/MMS_index/2007.02/ > > Index Size on Disk: 1,400,569 KB > > Number of Records: 2682238 > > Index Version: 03/13/2007 > > > > Index 2: > > Index Path: /storage/unisearch/MMS_index/2007.03/ > > Index Size on Disk: 2,055,199 KB > > Number of Records: 3457434 > > Index Version: 03/13/2007 > > > > The search is being performed in tomcat and I'm running: > org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB > memory > and Red Hat 3.4.3-22. > > > > So, onto the question: Is this fast, slow, or normal. > > > > Along, with the obvious follow up: if it's slow, how can I make it > faster. > > > > Thanks for your help! > > > > -Dave > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]