According to Geoff Hutchison:
> On Sun, 23 Dec 2001, David Melton wrote:
> > Is there any way to change the way that ht://Dig determines the date
> > for a file that it's searching?  In the case of the list archive, it
> > would be simple to write an external script to get the date from the
> > message's X-Date: header.  I would expect that this might be widely
> > useful.  In the case of my other application, I could also write a
> > simple program to extract a date from the html file.
> 
> Sure. This has come up several times. There's even an HTML META tag to
> handle dates. So if you can get a <META name="date" ...> tag into the
> documents, then you'll be fine.
> ftp://ftp.ccsf.org/htdig-patches/3.1.5/SortMetaDate.0
> 
> Offhand, I don't know if the use_doc_date attribute has been added to the
> 3.1.6 snapshots, but if not, I'll make sure it's in there when I get back
> from vacation.

Yes, it's in 3.1.6.  Also, 3.1.6 adds support for Dublin Core date fields
as well, i.e. name="dc.date" and a few others, not just name="date".
If you can somehow get the site you're indexing to put out these meta
tags from the X-Date field, that should do the trick for you.

> > My other case could be more of a problem, since the historical files
> > contain data going back to 1757, which is a long time before 1970...
> 
> I don't know what standards are available for this. I know some systems
> have time_t as a signed variable type, so it can count before Jan 1, 1970,
> but it's not cross-platform. (Similarly not all UNIX-like platforms have
> switched to 64-bit times and older platforms, will of course hit the 2038
> barrier.)

Yeah, to handle dates going back that far, you'd need a system that
supports 64-bit signed time_t fields, as well as an strftime() function
that interprets negative time_t values as pre-1970 dates.  You'd also
likely need to make a few tweaks to parsedcdate() in 3.1.6's Retriever.cc
so it allows years before 1900, and so it does the 64-bit arithmetic
correctly.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to