[ This patch fixes a small problem in the one posted on Feb. 14 ] According to Geoff Hutchison: > On Thu, 8 Feb 2001, Richard Seymour wrote: > > Now the client wants to be able to search based on, not just the text in > > the documents, but also a range of dates. > > This has been requested several times before (date-restricted searching), > but has not been implemented at the moment. There was some not-complete > code around by Mike Grommet: > > (start of the thread) > <http://htdig.sourceforge.net/htdig-dev/1999/03/0105.html> > > But at the moment, it hasn't been finalized AFAIK. Certainly anyone > willing to pick up the pieces would gain vast riches and wide acclaim. Aw, shucks, if you put it that way... Really, though, this had been nagging at me for a while, so I finally decided to tackle it. I sorted through the old messages, and the patch archive. Joe had archived one of Mike's early attempts, from April 1, 1999, but didn't have the more recent April 6, 1999 patch. I grabbed that one, which I had fortunately saved, and fixed the header that BeroList had mangled so I could extract it. It was a patch for 3.1.1, but it applied to 3.1.5. Mike had gotten something functional working, and had planned for a lot of user input contingencies, but never seemed to get the changes into createURL and setVariables to pass on the parameters. In subsequent discussions he got bogged down in how to set up select lists for input parameters, and in the end he never posted a patch for propagating the start and end date parameters to subsequent search forms. Setting up select lists is easy, now, with the build_select_lists attribute, so I just added code to propagate the simple numeric parameters. I also added a number of fixes, some of which had been discussed prior to Mike's patch, but never implemented, and added a bit of documentation. The fixes are: using server's local time zone, initialising the structures fully, handling 2 digit years, setting the end time to the last second of the given day, and correctly handling Feb. 29. The code now depends on a working mktime() function in your C library. If you don't have it, C code for it exists in htlib/mktime.c, but this isn't compiled by the Makefile. The patch can be applied in the main htdig-3.1.5 source directory using "patch -p0 < this-file". I don't have a sample search form, but that should be easy to figure out. I'll try to adapt this for 3.2.0b3 once I can get the latest snapshot and build it. Be aware that in 3.1.x, the date range selection will trigger the same slowdown as sorting by date or a non-zero backlink_factor or date_factor, when there are lots of initial matches to process, as explained in FAQ 5.10. I hope many people find this useful. My thanks to Mike Grommet for his early work on this, and to all who prodded me into updating and finishing it. --- htdoc/hts_form.html.orig Thu Feb 24 20:29:10 2000 +++ htdoc/hts_form.html Wed Feb 14 15:02:14 2001 @@ -143,13 +143,26 @@ make this item a drop down menu so the user can select the type of sort at search time. </dd> + <dt> + <b>startyear</b>, <b>startmonth</b>, <b>startday</b>, + <b>endyear</b>, <b>endmonth</b>, <b>endday</b> + </dt> + <dd> + These values specify the allowed range of document + modification dates allowed in the search results. + They can be used to restrict the search + to particular "ages" of documents, new or old.<br> + The default is the full range of documents in the database. + These values can also be specified by configuration attributes + of the same names in the configuration file. + </dd> </dl> <hr size="4" noshade> <address> <a href="author.html">Andrew Scherpbier <[EMAIL PROTECTED]></a> </address> -Last modified: $Date: 2000/02/17 22:05:21 $ +Last modified: $Date: 2001/02/14 15:01:31 $ </body> </html> --- htdoc/hts_templates.html.orig Thu Feb 24 20:29:10 2000 +++ htdoc/hts_templates.html Wed Feb 14 15:05:50 2001 @@ -375,6 +375,14 @@ the right. </dd> <dt> + <b>STARTYEAR</b>, <b>STARTMONTH</b>, <b>STARTDAY</b>, + <b>ENDYEAR</b>, <b>ENDMONTH</b>, <b>ENDDAY</b> + </dt> + <dd> + The currently specified date range for restricting search + results. + </dd> + <dt> <b>SYNTAXERROR</b> </dt> <dd> @@ -410,7 +418,7 @@ <a href="author.html">Andrew Scherpbier <[EMAIL PROTECTED]></a> </address> -Last modified: $Date: 2000/02/15 22:08:36 $ +Last modified: $Date: 2001/02/14 15:05:12 $ </body> </html> --- htsearch/htsearch.cc.orig Thu Feb 24 20:29:11 2000 +++ htsearch/htsearch.cc Wed Feb 14 12:33:45 2001 @@ -180,6 +180,27 @@ main(int ac, char **av) if (input.exists("sort")) config.Add("sort", input["sort"]); + // Changes added 3-31-99, by Mike Grommet + // Check form entries for starting date, and ending date + // Each date consists of a month, day, and year + + if (input.exists("startmonth")) + config.Add("startmonth", input["startmonth"]); + if (input.exists("startday")) + config.Add("startday", input["startday"]); + if (input.exists("startyear")) + config.Add("startyear", input["startyear"]); + + if (input.exists("endmonth")) + config.Add("endmonth", input["endmonth"]); + if (input.exists("endday")) + config.Add("endday", input["endday"]); + if (input.exists("endyear")) + config.Add("endyear", input["endyear"]); + + // END OF CHANGES BY MIKE GROMMET + + minimum_word_length = config.Value("minimum_word_length", minimum_word_length); StringList form_vars(config["allow_in_form"], " \t\r\n"); --- htsearch/Display.cc.orig Thu Feb 24 20:29:11 2000 +++ htsearch/Display.cc Wed Feb 14 15:47:35 2001 @@ -423,6 +423,12 @@ Display::setVariables(int pageNumber, Li } else { vars.Add("CGI", new String(getenv("SCRIPT_NAME"))); } + vars.Add("STARTYEAR", new String(config["startyear"])); + vars.Add("STARTMONTH", new String(config["startmonth"])); + vars.Add("STARTDAY", new String(config["startday"])); + vars.Add("ENDYEAR", new String(config["endyear"])); + vars.Add("ENDMONTH", new String(config["endmonth"])); + vars.Add("ENDDAY", new String(config["endday"])); String *str; char *format = input->get("format"); @@ -631,6 +637,18 @@ Display::createURL(String &url, int page url << "keywords=" << encodeInput("keywords") << ';'; if (input->exists("words")) url << "words=" << encodeInput("words") << ';'; + if (input->exists("startyear")) + url << "startyear=" << encodeInput("startyear") << ';'; + if (input->exists("startmonth")) + url << "startmonth=" << encodeInput("startmonth") << ';'; + if (input->exists("startday")) + url << "startday=" << encodeInput("startday") << ';'; + if (input->exists("endyear")) + url << "endyear=" << encodeInput("endyear") << ';'; + if (input->exists("endmonth")) + url << "endmonth=" << encodeInput("endmonth") << ';'; + if (input->exists("endday")) + url << "endday=" << encodeInput("endday") << ';'; StringList form_vars(config["allow_in_form"], " \t\r\n"); for (i= 0; i < form_vars.Count(); i++) { @@ -1008,6 +1026,179 @@ Display::buildMatchList() double date_factor = config.Double("date_factor"); SortType typ = sortType(); + + // Additions made here by Mike Grommet 4-1-99 ... + + tm startdate; // structure to hold the startdate specified by the user + tm enddate; // structure to hold the enddate specified by the user + + time_t eternity = ~(1<<(sizeof(time_t)*8-1)); // will be the largest value +holdable by a time_t + tm *endoftime; // the time_t eternity will be converted into a tm, held by +this variable + + time_t timet_startdate; + time_t timet_enddate; + int monthdays[] = {31,28,31,30,31,30,31,31,30,31,30,31}; + + // boolean to test to see if we need to build date information or not + int dategiven = ((config.Value("startmonth")) || + (config.Value("startday")) || + (config.Value("startyear")) || + (config.Value("endmonth")) || + (config.Value("endday")) || + (config.Value("endyear"))); + + // find the end of time + endoftime = gmtime(&eternity); + + if(dategiven) // user specified some sort of date information + { + time_t now = time((time_t *)0); // fill in all fields for mktime + tm *lt = localtime(&now); // - Gilles's fix + startdate = *lt; + enddate = *lt; + + // set up the startdate structure + // see man mktime for details on the tm structure + startdate.tm_sec = 0; + startdate.tm_min = 0; + startdate.tm_hour = 0; + startdate.tm_yday = 0; + startdate.tm_wday = 0; + + // The concept here is that if a user did not specify a part of a date, + // then we will make assumtions... + // For instance, suppose the user specified Feb, 1999 as the start + // range, we take steps to make sure that the search range date starts + // at Feb 1, 1999, + // along these same lines: (these are in MM-DD-YYYY format) + // Startdates: Date Becomes + // 01-01 01-01-1970 + // 01-1970 01-01-1970 + // 04-1970 04-01-1970 + // 1970 01-01-1970 + // These things seem to work fine for start dates, as all months have + // the same first day however the ending date can't work this way. + + if(config.Value("startmonth")) // form input specified a start month + { + startdate.tm_mon = config.Value("startmonth") - 1; + // tm months are zero based. They are passed in as 1 based + } + else startdate.tm_mon = 0; // otherwise, no start month, default to 0 + + if(config.Value("startday")) // form input specified a start day + { + startdate.tm_mday = config.Value("startday"); + // tm days are 1 based, they are passed in as 1 based + } + else startdate.tm_mday = 1; // otherwise, no start day, default to 1 + + // year is handled a little differently... the tm_year structure + // wants the tm_year in a format of year - 1900. + // since we are going to convert these dates to a time_t, + // a time_t value of zero, the earliest possible date + // occurs Jan 1, 1970. If we allow dates < 1970, then we + // could get negative time_t values right??? + // (barring minor timezone offsets west of GMT, where Epoch is 12-31-69) + + if(config.Value("startyear")) // form input specified a start year + { + startdate.tm_year = config.Value("startyear") - 1900; + if (startdate.tm_year < 69-1900) // correct for 2-digit years 00-68 + startdate.tm_year += 2000; // - Gilles's fix + if (startdate.tm_year < 0) // correct for 2-digit years 69-99 + startdate.tm_year += 1900; + } + else startdate.tm_year = 1970-1900; + // otherwise, no start day, specify start at 1970 + + // set up the enddate structure + enddate.tm_sec = 59; // allow up to last second of end day + enddate.tm_min = 59; // - Gilles's fix + enddate.tm_hour = 23; + enddate.tm_yday = 0; + enddate.tm_wday = 0; + + if(config.Value("endmonth")) // form input specified an end month + { + enddate.tm_mon = config.Value("endmonth") - 1; + // tm months are zero based. They are passed in as 1 based + } + else enddate.tm_mon = 11; // otherwise, no end month, default to 11 + + if(config.Value("endyear")) // form input specified a end year + { + enddate.tm_year = config.Value("endyear") - 1900; + if (enddate.tm_year < 69-1900) // correct for 2-digit years 00-68 + enddate.tm_year += 2000; // - Gilles's fix + if (enddate.tm_year < 0) // correct for 2-digit years 69-99 + enddate.tm_year += 1900; + } + else enddate.tm_year = endoftime->tm_year; + // otherwise, no end year, specify end at the end of time allowable + + // Months have different number of days, and this makes things more + // complicated than the startdate range. + // Following the example above, here is what we want to happen: + // Enddates: Date Becomes + // 04-31 04-31-endoftime->tm_year + // 05-1999 05-31-1999, may has 31 days... we want to +search until the end of may so... + // 1999 12-31-1999, search until the end of the year + + if(config.Value("endday")) // form input specified an end day + { + enddate.tm_mday = config.Value("endday"); + // tm days are 1 based, they are passed in as 1 based + } + else + { + // otherwise, no end day, default to the end of the month + enddate.tm_mday = monthdays[enddate.tm_mon]; + if (enddate.tm_mon == 1) // February, so check for leap year + if (((enddate.tm_year+1900) % 4 == 0 && + (enddate.tm_year+1900) % 100 != 0) || + (enddate.tm_year+1900) % 400 == 0) + enddate.tm_mday += 1; // Feb. 29 - Gilles's fix + } + + // Convert the tm values into time_t values. + // Web servers specify modification times in GMT, but htsearch + // displays these modification times in the server's local time zone. + // For consistency, we would prefer to select based on this same + // local time zone. - Gilles's fix + + timet_startdate = mktime(&startdate); + timet_enddate = mktime(&enddate); + + // I'm not quite sure what behavior I want to happen if + // someone reverses the start and end dates, and one of them is invalid. + // for now, if there is a completely invalid date on the start or end + // date, I will force the start date to time_t 0, and the end date to + // the maximum that can be handled by a time_t. + + if(timet_startdate < 0) + timet_startdate = 0; + if(timet_enddate < 0) + timet_enddate = eternity; + + // what if the user did something really goofy like choose an end date + // that's before the start date + + if(timet_enddate < timet_startdate) // if so, then swap them so they are in +order + { + time_t timet_temp = timet_enddate; + timet_enddate = timet_startdate; + timet_startdate = timet_temp; + } + } + else // no date was specifed, so plug in some defaults + { + timet_startdate = 0; + timet_enddate = eternity; + } + + // ... MG + results->Start_Get(); while ((id = results->Get_Next())) { @@ -1054,9 +1245,21 @@ Display::buildMatchList() // We want older docs to have smaller values and the // ultimate values to be a reasonable size (max about 100) - if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore) + // New check added on whether or not we need to check date ranges - MG + if (date_factor != 0.0 || backlink_factor != 0.0 || typ != SortByScore + || timet_startdate > 0 || enddate.tm_year < endoftime->tm_year) { DocumentRef *thisRef = docDB[thisMatch->getURL()]; + + // code added by Mike Grommet for date search ranges + // check for valid date range. toss it out if it isn't relevant. + if(thisRef->DocTime() < timet_startdate || thisRef->DocTime() > +timet_enddate) + { + delete thisMatch; + delete thisRef; + continue; + } + if (thisRef) // We better hope it's not null! { score += date_factor * -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

