At 14:11 14/06/01 -0500, Gilles Detillieux wrote:
>According to Marcus Valentine:
>> invoking doc2html.pl from htdig. When htdig spiders the site, for each pdf
>> it comes across I get an error message like
>> 
>> !!      Error: Couldn't open file '/cygdrive/d/htdext.326'
>> 
>> This is to do with the temporary file used to pipe the output from
>> doc2html.pl to htdig, yes?  I've tried various environment settings of tmp,
>> tmpdir or whatever the hell it's trying to use (isn't there a similar issue
>> with htmerge under NT, that thankfully I'm not suffering from) tinkering
>> around with both at the dos prompt and the bash prompt to no avail. Can
>> anyone shed some light on this?
>
>Both htdig and htmerge make use of the TMPDIR environment variable (note
>the name is all caps).  That error message seems to be coming from
>pdftotext, though, and not htdig or doc2html.pl.  That means that the
>file is being created and htdig is calling doc2html.pl, which in turn
>is calling pdftotext.

All this cygdrive stuff was getting too complicated, as I had /cygwin/bin
in my path. To simplify things, I took cygwin/bin out of my path and put
cygwin1.dll into its own directory, with that directory in the path.

Next I installed activeware perl, as this appears to be the perl of choice
of successful win32 htdig users. Then I set TMPDIR=d//

Now htdig runs, with no errors when it encounters a pdf.  For example

15:15:1:http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf:
size = 69129

But when I run htmerge, I get for example 

Deleted, no excerpt:
15/http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf

Is the pdf being indexed or not?  Anyone got any ideas?

Marcus Valentine

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to