Using the external converter switch (application/pdf->text/html) I index PDF files which works perfectly
however the PDF files (abt, 8000) all have undescriptive titles like "Word doc 2" instead of "Proposal for the yearly members meeting".
In order to properly show the titles I use a small script which uses the ASCII dump (-t switch), rewrite that with correct titles
and with htload I load it into the DB2 database. After that U run htmerge and htfuzzy and it all seems to work ...
But for a weird reason I can't search for words in those patched titles. If I wanted to search for proposal my example patched
document would not be found.
I started digging deeper and I (just for the testcase) used a small script which returns "nanananana" instead of the PDF title returned by pdfinfo
and guess: the "nananan" will be found. So that seemed to be the sollution, but .... when htdig calls the converter it gives the
tmp (downloaded PDF) name instead of the actual document name. In this way it's pretty hard to set the correct title if I only know the tmp name
and not the PDF file or the location (URL) of the document.
Is it a (good) idea to do rewrites straight into the DB2 database and do I have to reindex etc or are there better options. Is it an idea to use the
external parser instead of the external converter and write my own htdig records ?
Cheers,
Wim Kosten
------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev