Goodmorning, I have quite a weird problem with indexing about 8000 PDF's.
The files are indexed through a local_urls= setting which works perfect (all files are found as local equivalent of the URL version) but all files are allways changed according to htdig.
For indexing the PDF's I use an executable PHP script which uses in his turn pdfinfo / pdftotext (both version 3.xx) and queries a database to retrieve some additional meta info (like the correct title etc). All gathered info is rendered into HTML which is indexed by htdig. It also adds 3 meta items: "Last-Modified", "Date" and "DC.Date" to force the modification date. In conjunction with the use_doc_date it should be clear to htdig that the document was changed or not.
I can't figure out why every day the PDF's are changed (and they're not) but I have the idea that htdig takes the filetime of the tmpfile as last-modified.
Any clues? Regards, Wim -- Wim Kosten <[EMAIL PROTECTED]> ibuildings.nl BV - information technology http://www.ibuildings.nl - 0118 42 95 50 ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ ht://Dig Developer mailing list: htdig-dev@lists.sourceforge.net List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev