Many thanks for your response. there is effectivly a trailing space at the end of one of the parser line ...

Now it work great.

Alain DESEINE.

At 12:42 31/01/2003 +0000, David Adams wrote:
All the messages like

PDF::parseNonTextLine: total pages is 49
PDF::parseNonTextLine: start page 1

are being produced by the internal PDF parser, which means that your
external_parser: statement is being ignored.
Check that there are no spaces or other characters after the \ at the end of
each line in your configuration file.

David Adams
University of Southampton

----- Original Message -----
From: "Alain DESEINE" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, January 27, 2003 4:42 PM
Subject: [htdig] Pbs seting up PDF support


> Hi,
>
> I got problems seting up PDF support for htdig.
>
> Here is info about my setup :
>
> linux version : kernel 2.4.10-4GB
> htdig version : 3.1.6 (compiled from source)
> install dir /opt/www/htdig
> install french support from Dider Lebrun
>
> htdig work well with html files.
>
> i've installed xpdf
> i've installed pdf2html, work well from linux prompt
> i've installed doc2html, worl well from linux prompt
> i've modify the htdig.conf file to call the doc2html converter for
> application/pdf files
>
> When i run rundig the PDF was not inserted in the PDF got this message in
> the log :
>
> Deleted, no excerpt: 44/http://www.cabinfo.com/documents/pdf/adsl.pdf
> Deleted, no excerpt: 45/http://www.cabinfo.com/documents/pdf/gprs.pdf
> Deleted, no excerpt: 43/http://www.cabinfo.com/documents/pdf/wap.pdf
>
> i've run rundig with -vvvv flag and got something like this in the log
>
> Header line: HTTP/1.1 200 OK
> Header line: Date: Mon, 27 Jan 2003 15:17:11 GMT
> Header line: Server: Apache
> Header line: Last-Modified: Mon, 20 Jan 2003 17:22:48 GMT
> Converted Mon, 20 Jan 2003 17:22:48 GMT to Mon, 20 Jan 2003 17:22:48
> Header line: ETag: "13b7ac-50cf6-3e2c3068"
> Header line: Accept-Ranges: bytes
> Header line: Content-Length: 330998
> Header line: Connection: close
> Header line: Content-Type: application/pdf
> Header line:
> returnStatus = 0
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> ...
> ...
> Read 8192 from document
> Read 3318 from document
> Read a total of 330998 bytes
> PDF::setContents(330998 bytes)
> PDF::parse(http://www.cabinfo.com/documents/pdf/wap.pdf)
> PDF::parseNonTextLine: title is "��"
>
> title: ��
> PDF::parseNonTextLine: total pages is 49
> PDF::parseNonTextLine: start page 1
> PDF::parseNonTextLine: begin text block
> PDF::parseTextLine("70.5 40.5 TD") cmd=TD
> PDF::parseTextLine("0 0 0 rg") cmd=rg
> PDF::parseTextLine("/N6 9.75 Tf") cmd=Tf
> PDF::parseTextLine("0.08999 Tc") cmd=Tc
> PDF::parseTextLine("0 Tw") cmd=Tw
> PDF::parseTextLine("(\251)Tj ") cmd=
> PDF::parseTextLine("7.5 0 TD") cmd=TD
> PDF::parseTextLine("/N8 9.75 Tf") cmd=Tf
> PDF::parseTextLine("0.11048 Tc") cmd=Tc
> PDF::parseTextLine("0.17898 Tw") cmd=Tw
> PDF::parseTextLine("( Alain DESEINE, 1999)Tj ") cmd=
> PDF::parseTextLine("375.75 693 TD") cmd=TD
> PDF::parseTextLine("/N10 14.25 Tf") cmd=Tf
> PDF::parseTextLine("-0.33178 Tc") cmd=Tc
> ...
>
> and so on for the entire content of the pdf ...
>
> These informations tell to me that it's the internal parser that is used
to
> parse the pdf, and not the doc2html.pl script, but i'm not shure. i've
> browse and search the list archive, but don't find someting like that, so
> if you can help me ...
>
> here is the htdig.conf file.
>
> database_dir:           /home/info/www/htdig/db
> start_url:              http://www.cxabinfo.com/ \
>                          http://www.cxabinfo.com/index2.html
> limit_urls_to:          ${start_url}
> exclude_urls:           /cgi-bin/ .cgi
> #bad_extensions:                .wav .gz .z .sit .au .zip .tar .hqx .exe
> .com .gif \
> #       .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi
.css
> #maintainer:            [EMAIL PROTECTED]
> max_head_length:        10000
> max_doc_size:           2000000
> no_excerpt_show_top:    true
> search_algorithm:       exact:1 synonyms:0.5 endings:0.1
> # template_map:         cxabinfo cxabinfo
> /opt/www/htdig/common/htdig_template.html
> template_map:           cxabinfo cxabinfo
${common_dir}/htdig_template.html
> # template_name: cxabinfo
> search_results_header:  /opt/www/htdig/common/htdig_header.html
> search_results_footer:
> nothing_found_file:     /opt/www/htdig/common/htdig_nomatch.html
> syntax_error_file:      /opt/www/htdig/common/htdig_syntaxerror.html
> next_page_text:         <img src="/htdig/buttonr.gif" border="0"
> align="middle" width="30" height="30" alt="next">
> no_next_page_text:
> prev_page_text:         <img src="/htdig/buttonl.gif" border="0"
> align="middle" width="30" height="30" alt="prev">
> no_prev_page_text:
> external_parsers:      application/rtf->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         text/rtf->text/html /opt/www/htdig/bin/doc2html.pl
\
>                         application/pdf->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/postscript->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/msword->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/wordperfect5.1->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/msexcel->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/vnd.ms-excel->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/vnd.ms-powerpoint->text/html
> /opt/www/htdig/bin/doc2html.pl
>                         application/x-shockwave-flash->text/html
> /opt/www/htdig/bin/doc2html.pl \
>                         application/x-shockwave-flash2-preview->text/html
> /opt/www/htdig/bin/doc2html.pl
>
> # local variables:
> # ----- debut de francisation -----
> locale: fr_FR
> valid_punctuation: ._/!#$%^&
>
> # Search options names:
> method_names: and 'Tous les mots' or 'Un des mots' boolean Bool�en
> sort_names: score Score time Date title Titre revscore 'Score inverse'
> revtime 'Date inverse' revtitle 'Titre inverse'
>
> # language files:
> endings_dictionary: ${common_dir}/francais.0
> endings_affix_file: ${common_dir}/francais.aff
> bad_word_list: ${common_dir}/bad_words.fr
> synonym_dictionary: ${common_dir}/synonyms.fr
> # ----- fin de francisation -----
>
> # mode: text
> # eval: (if (eq window-system 'x) (progn (setq font-lock-keywords (list
> '("^#.*" . font-lock-keyword-face) '("^[a-zA-Z][^ :]+" .
> font-lock-function-name-face) '("[+$]*:" . font-lock-comment-face) ))
> (font-lock-mode)))
> # end:
>
> Many thanks for responses.
>
> Alain DESEINE.
>
>
>
> -------------------------------------------------------
> This SF.NET email is sponsored by:
> SourceForge Enterprise Edition + IBM + LinuxWorld
http://www.vasoftware.com
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
>


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to