tom wa writes: > From: Erik Hatcher > > On Jan 29, 2004, at 5:08 AM, tom wa wrote: > > > I'm trying to create an index which can also be searched with date > > > ranges. My first attempt using the Lucene date format ran in to > > > trouble after my index grew and I couldn't search over more than a few > > > days. > > > the suggestion > > > seemed to be to use strings of the format yyyyMMdd. Using that format > > > worked great until I remembered that my search needs > > > to be able to support different timezones. Adding the hour to my > > > field causes the same problem above and my queries stop working when > > > using a range of about 2 months. > > > > When you say you couldn't search and that it stopped working, do you > > mean it was just unacceptably slow? > > > > (Sorry it's taken me a while to reply) > > It wasn't slow, my timeout is far greater than the time it takes to come back with > no hits. > > A small example of a query would be (date: [200306081900 TO 200306201200]) AND > (text: sometext) and this will return zero hits. The index contains about 1000 items > for each 24hr period and the total number of documents was about 150k. I had the > same results when using Lucene's built in date format too. If you think it should be > able to cope with what I am trying to do then I'll take another look. > An alternative to using date ranges or date filters is to use an aproach similar to the recently introduced sort on a integer field (cvs only, so far).
That is, - create an array of the dates of all documents - extend the low level search, in a way that it uses this array and a upper and lower limit to do an additional selection (that's similar to what the filter does) The advantage over a filter is, that you can use the same array for arbitrary date ranges while a filter is specific to a date range. OTOH the array needs to be newly created whenever the index changes. The cost depends on the number of different dates and the array size of course. I did some tests and found, that it takes less than .1 seconds on a P4 2400 Mhz to create such an array for ~ 100000 documents, ~ 10000 different dates. So it depends a bit on how often your index changes, if that's a good way. Another disadvantage is, that you will have to dig a little bit deeper into lucenes search classes. And memory usage might get a problem, once you exceed a few million documents. Morus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
