According to Tsai, Jin:
> The HtDig indexing process seems to be ignore all of PDF, Word, Excel
> documents if the page sources are SSL-enabled as https and been handled by
> the external handler.  The html pages seems to be indexed correctly even if
> they are from https web server.
...
> external_protocols:     https /usr/local/bin/handler.pl
> 
> However, all of PDF, Word, Excel, and PPT documents are indexed correctly if
> the page sources are via http, which is handled by the internal HtDig
> indexing handler.

I'd be interested in knowing how your handler.pl script puts out the
"t" records for Content-Type, which is critical for htdig properly
identifying what's what as far as file types.  E.g., for a PDF, it should
emit a header record like this:

t:      application/pdf

with one tab and no spaces.  If it's not doing that, that may be the cause
of your problem.

> The log of all PDF, Word, Excel, etc documents is recorded in
> /var/log/doc2html.log, and it shows no evidence of any documents been
> indexed if they are from https://.  In addition, htstat -u shows no PDF,
> Word documents been indexed from any https web server.
> 
> The HtDig search engine is version 3.2.0-1.b4.0.72 and is running on RedHat
> Linux v.7.2 with kernel 2.4.9-31.

The htdig-3.2.0-1.b4.0.72 RPM was built with a late October 2001 snapshot
of 3.2.0b4, which had a problem with how the external transport handler
managed the access time object.  I don't know if that could lead to the
problem you report, but if the handler.pl is functioning correctly, it
may be worth a try rebuilding with a more recent snapshot.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to