Goodmorning,

I have quite a weird problem with indexing about 8000 PDF's.

The files are indexed through a local_urls= setting which works perfect (all files are found as local equivalent of the URL version) but all files are allways changed according to htdig.

For indexing the PDF's I use an executable PHP script which uses in his turn pdfinfo / pdftotext (both version 3.xx) and queries a database to retrieve some additional meta info (like the correct title etc). All gathered info is rendered into HTML which is indexed by htdig. It also adds 3 meta items: "Last-Modified", "Date" and "DC.Date" to force the modification date. In conjunction with the use_doc_date it should be clear to htdig that the document was changed or not.

I can't figure out why every day the PDF's are changed (and they're not) but I have the idea that htdig takes the filetime of the tmpfile as last-modified.

Any clues?

Regards,
Wim

--
Wim Kosten             <[EMAIL PROTECTED]>
ibuildings.nl BV -  information technology
http://www.ibuildings.nl -   0118 42 95 50



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
ht://Dig Developer mailing list:
htdig-dev@lists.sourceforge.net
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to