OK, I have applied the patch supplied by Gilles, and it fixed the lack of
use of the date Meta tag in v.3.1.6. However, I have now come across a
more subtle problem - it boils down to the use or lack thereof of the
timezone information. In htsearch/Display.cc, the DocTime variable is
assumed to be in UTC, as modification dates returned by a server are in
UTC. However, the meta Date tags are likely already in the local time
zone, so when the comparison happens between DocTime and timet_startdate
and timet_enddate on lines 1417 and 1418 of Display.cc, it is between a
local time in the case of DocTime and a UTC time in the case of
timet_(start|end)date (due to the use of mktime on lines 1329 and 1330
instead of localtime). This, along with converting the date string
YYYY-MM-DD to YYYY-MM-DD 00:00:00 when meta tags are read, means that
searching for a start day of files that have meta tags will result in
missing a days worth of files.

The solution to this could be one of a few things -

Tell people they should use UTC in their meta tags. All in all, not really
a bad solution, but less than ideal for sites that already have it in
place.

In parsedcdate, assume that unqualified dates are at noon instead of
midnight. If no one is ever more than 12 hours plus or minus UTC, this is
actually a very easy hack, er, solution.

In parsedcdate, reset the time obtained to UTC by pulling the current time
zone offset (difference between localtime and mktime) - this assumes,
however, that the server and the search engine are in the same time zone.

Create a separate value for each document indicating whether or not the
date was obtained from the modification time or from the meta tag.

Create a separate value that is only for document modification date
information, and keep populating the current DocTime value like you do
now. The additional advantage of this is that, currently, if the meta tag
on a file stays the same, I don't think it ever gets reindexed - however,
the meta tag could stay the same and the contents could change. Separating
out the two would prevent this error from occurring. This seems like the
best course of action, though it has the disadvantage of increasing the
size of the database for all files, even though only a limited number use
the additional date information. I suppose the meta date info could only
be populated if the file has it, though.

Opinions, anyone?

                        Bill Knox
                        Senior Operating Systems Programmer/Analyst
                        The MITRE Corporation






_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to