At 11:57 12/06/01 -0500, Gilles Detillieux spake thusly:
>According to Marcus Valentine:
>> At 17:06 12/06/01 +0100, you (David Adams) wrote:
>> >It seems that doc2html.pl is not calling pdf2html.pl, instead it is
reading
>> >the .PDF file itself, as though it were plain text.
>> >
>> >Be sure to include the correct MIME type when calling doc2html.pl from the
>> >command line, this is most important.
>> 
>> Please could you indulge me and explain how to do this?
>
>See http://www.htdig.org/attrs.html#external_parsers for details.
>
>The arguments to external parser or converter scripts are:
>
>  command-name input-file content-type document-URL config-file
>
>E.g.:
>
>  doc2html.pl mydoc.pdf application/pdf http://myhost.uk/mydoc.pdf htdig.conf
>
>I think the last argument is pretty much always ignored in existing
>parser/converter scripts.

Thanks - this proves my doc2html.pl is working. The next problem is
invoking doc2html.pl from htdig. When htdig spiders the site, for each pdf
it comes across I get an error message like

!!      Error: Couldn't open file '/cygdrive/d/htdext.326'

This is to do with the temporary file used to pipe the output from
doc2html.pl to htdig, yes?  I've tried various environment settings of tmp,
tmpdir or whatever the hell it's trying to use (isn't there a similar issue
with htmerge under NT, that thankfully I'm not suffering from) tinkering
around with both at the dos prompt and the bash prompt to no avail. Can
anyone shed some light on this?

Also worth mentioning for NT type punters is that ht*.exe appears to need
to be told where its configuration file is, even when it's in the default
location.  Not a problem for htdig and htmerge, but for htsearch I'm
calling it from htwrap.pl and have modified the call from 

system("$dir/htsearch"); 

to

system("$dir/htsearch -c $config"); 

Thanks

Marcus Valentine

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to