At 14:11 14/06/01 -0500, Gilles Detillieux wrote:
>According to Marcus Valentine:
>> invoking doc2html.pl from htdig. When htdig spiders the site, for each pdf
>> it comes across I get an error message like
>>
>> !! Error: Couldn't open file '/cygdrive/d/htdext.326'
>>
>> This is to do with the temporary file used to pipe the output from
>> doc2html.pl to htdig, yes? I've tried various environment settings of tmp,
>> tmpdir or whatever the hell it's trying to use (isn't there a similar issue
>> with htmerge under NT, that thankfully I'm not suffering from) tinkering
>> around with both at the dos prompt and the bash prompt to no avail. Can
>> anyone shed some light on this?
>
>Both htdig and htmerge make use of the TMPDIR environment variable (note
>the name is all caps). That error message seems to be coming from
>pdftotext, though, and not htdig or doc2html.pl. That means that the
>file is being created and htdig is calling doc2html.pl, which in turn
>is calling pdftotext.
All this cygdrive stuff was getting too complicated, as I had /cygwin/bin
in my path. To simplify things, I took cygwin/bin out of my path and put
cygwin1.dll into its own directory, with that directory in the path.
Next I installed activeware perl, as this appears to be the perl of choice
of successful win32 htdig users. Then I set TMPDIR=d//
Now htdig runs, with no errors when it encounters a pdf. For example
15:15:1:http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf:
size = 69129
But when I run htmerge, I get for example
Deleted, no excerpt:
15/http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf
Is the pdf being indexed or not? Anyone got any ideas?
Marcus Valentine
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html