At 09:10 19/06/01 +0100, David Adams wrote:
>Doc2html.pl is not giving you any error messages, so it seems to be working.
>
>Add
>
>DOC2HTML_LOG = ""
>export DOC2HTML_LOG
>
>to the (Bourne shell) script that runs htdig and doc2html.pl will output a
>line for every file indexed,
>which will include the number of bytes extracted and sent back to htdig.

Sorry - I'm not with you.  Presently I'm running htdig from the dos command
line.  Are you saying I should create a script file and run it from within
cygwin at the bash prompt? I tried

DOC2HTML_LOG = ""
./htdig -i -c ../conf/htdig.conf
export DOC2HTML_LOG

but the "can't open file /tmp/htdext.???" problem recurs, which I fixed by
running htdig from the dos command line and setting TMPDIR=d//

Thanks

>--
>David Adams
>Computing Services
>Southampton University
>
>
>----- Original Message -----
>From: "Marcus Valentine" <[EMAIL PROTECTED]>
>To: <[EMAIL PROTECTED]>
>Sent: Monday, June 18, 2001 4:58 PM
>Subject: Re: [htdig] doc2html.pl version 3 problems under windows NT
>
>
>> At 14:11 14/06/01 -0500, Gilles Detillieux wrote:
>> >According to Marcus Valentine:
>> >> invoking doc2html.pl from htdig. When htdig spiders the site, for each
>pdf
>> >> it comes across I get an error message like
>> >>
>> >> !!      Error: Couldn't open file '/cygdrive/d/htdext.326'
>> >>
>> >> This is to do with the temporary file used to pipe the output from
>> >> doc2html.pl to htdig, yes?  I've tried various environment settings of
>tmp,
>> >> tmpdir or whatever the hell it's trying to use (isn't there a similar
>issue
>> >> with htmerge under NT, that thankfully I'm not suffering from)
>tinkering
>> >> around with both at the dos prompt and the bash prompt to no avail. Can
>> >> anyone shed some light on this?
>> >
>> >Both htdig and htmerge make use of the TMPDIR environment variable (note
>> >the name is all caps).  That error message seems to be coming from
>> >pdftotext, though, and not htdig or doc2html.pl.  That means that the
>> >file is being created and htdig is calling doc2html.pl, which in turn
>> >is calling pdftotext.
>>
>> All this cygdrive stuff was getting too complicated, as I had /cygwin/bin
>> in my path. To simplify things, I took cygwin/bin out of my path and put
>> cygwin1.dll into its own directory, with that directory in the path.
>>
>> Next I installed activeware perl, as this appears to be the perl of choice
>> of successful win32 htdig users. Then I set TMPDIR=d//
>>
>> Now htdig runs, with no errors when it encounters a pdf.  For example
>>
>> 15:15:1:http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf:
>> size = 69129
>>
>> But when I run htmerge, I get for example
>>
>> Deleted, no excerpt:
>> 15/http://marcusv_pc:8080/toracomm/pdf/DS012_Design_Services.pdf
>>
>> Is the pdf being indexed or not?  Anyone got any ideas?
>>
>> Marcus Valentine


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to