Now it work great.
Alain DESEINE.
At 12:42 31/01/2003 +0000, David Adams wrote:
All the messages like PDF::parseNonTextLine: total pages is 49 PDF::parseNonTextLine: start page 1are being produced by the internal PDF parser, which means that your external_parser: statement is being ignored. Check that there are no spaces or other characters after the \ at the end of each line in your configuration file. David Adams University of Southampton ----- Original Message ----- From: "Alain DESEINE" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, January 27, 2003 4:42 PM Subject: [htdig] Pbs seting up PDF support > Hi, > > I got problems seting up PDF support for htdig. > > Here is info about my setup : > > linux version : kernel 2.4.10-4GB > htdig version : 3.1.6 (compiled from source) > install dir /opt/www/htdig > install french support from Dider Lebrun > > htdig work well with html files. > > i've installed xpdf > i've installed pdf2html, work well from linux prompt > i've installed doc2html, worl well from linux prompt > i've modify the htdig.conf file to call the doc2html converter for > application/pdf files > > When i run rundig the PDF was not inserted in the PDF got this message in > the log : > > Deleted, no excerpt: 44/http://www.cabinfo.com/documents/pdf/adsl.pdf > Deleted, no excerpt: 45/http://www.cabinfo.com/documents/pdf/gprs.pdf > Deleted, no excerpt: 43/http://www.cabinfo.com/documents/pdf/wap.pdf > > i've run rundig with -vvvv flag and got something like this in the log > > Header line: HTTP/1.1 200 OK > Header line: Date: Mon, 27 Jan 2003 15:17:11 GMT > Header line: Server: Apache > Header line: Last-Modified: Mon, 20 Jan 2003 17:22:48 GMT > Converted Mon, 20 Jan 2003 17:22:48 GMT to Mon, 20 Jan 2003 17:22:48 > Header line: ETag: "13b7ac-50cf6-3e2c3068" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 330998 > Header line: Connection: close > Header line: Content-Type: application/pdf > Header line: > returnStatus = 0 > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > ... > ... > Read 8192 from document > Read 3318 from document > Read a total of 330998 bytes > PDF::setContents(330998 bytes) > PDF::parse(http://www.cabinfo.com/documents/pdf/wap.pdf) > PDF::parseNonTextLine: title is "��" > > title: �� > PDF::parseNonTextLine: total pages is 49 > PDF::parseNonTextLine: start page 1 > PDF::parseNonTextLine: begin text block > PDF::parseTextLine("70.5 40.5 TD") cmd=TD > PDF::parseTextLine("0 0 0 rg") cmd=rg > PDF::parseTextLine("/N6 9.75 Tf") cmd=Tf > PDF::parseTextLine("0.08999 Tc") cmd=Tc > PDF::parseTextLine("0 Tw") cmd=Tw > PDF::parseTextLine("(\251)Tj ") cmd= > PDF::parseTextLine("7.5 0 TD") cmd=TD > PDF::parseTextLine("/N8 9.75 Tf") cmd=Tf > PDF::parseTextLine("0.11048 Tc") cmd=Tc > PDF::parseTextLine("0.17898 Tw") cmd=Tw > PDF::parseTextLine("( Alain DESEINE, 1999)Tj ") cmd= > PDF::parseTextLine("375.75 693 TD") cmd=TD > PDF::parseTextLine("/N10 14.25 Tf") cmd=Tf > PDF::parseTextLine("-0.33178 Tc") cmd=Tc > ... > > and so on for the entire content of the pdf ... > > These informations tell to me that it's the internal parser that is used to > parse the pdf, and not the doc2html.pl script, but i'm not shure. i've > browse and search the list archive, but don't find someting like that, so > if you can help me ... > > here is the htdig.conf file. > > database_dir: /home/info/www/htdig/db > start_url: http://www.cxabinfo.com/ \ > http://www.cxabinfo.com/index2.html > limit_urls_to: ${start_url} > exclude_urls: /cgi-bin/ .cgi > #bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe > .com .gif \ > # .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css > #maintainer: [EMAIL PROTECTED] > max_head_length: 10000 > max_doc_size: 2000000 > no_excerpt_show_top: true > search_algorithm: exact:1 synonyms:0.5 endings:0.1 > # template_map: cxabinfo cxabinfo > /opt/www/htdig/common/htdig_template.html > template_map: cxabinfo cxabinfo ${common_dir}/htdig_template.html > # template_name: cxabinfo > search_results_header: /opt/www/htdig/common/htdig_header.html > search_results_footer: > nothing_found_file: /opt/www/htdig/common/htdig_nomatch.html > syntax_error_file: /opt/www/htdig/common/htdig_syntaxerror.html > next_page_text: <img src="/htdig/buttonr.gif" border="0" > align="middle" width="30" height="30" alt="next"> > no_next_page_text: > prev_page_text: <img src="/htdig/buttonl.gif" border="0" > align="middle" width="30" height="30" alt="prev"> > no_prev_page_text: > external_parsers: application/rtf->text/html > /opt/www/htdig/bin/doc2html.pl \ > text/rtf->text/html /opt/www/htdig/bin/doc2html.pl \ > application/pdf->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/postscript->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/msword->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/wordperfect5.1->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/msexcel->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/vnd.ms-excel->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/vnd.ms-powerpoint->text/html > /opt/www/htdig/bin/doc2html.pl > application/x-shockwave-flash->text/html > /opt/www/htdig/bin/doc2html.pl \ > application/x-shockwave-flash2-preview->text/html > /opt/www/htdig/bin/doc2html.pl > > # local variables: > # ----- debut de francisation ----- > locale: fr_FR > valid_punctuation: ._/!#$%^& > > # Search options names: > method_names: and 'Tous les mots' or 'Un des mots' boolean Bool�en > sort_names: score Score time Date title Titre revscore 'Score inverse' > revtime 'Date inverse' revtitle 'Titre inverse' > > # language files: > endings_dictionary: ${common_dir}/francais.0 > endings_affix_file: ${common_dir}/francais.aff > bad_word_list: ${common_dir}/bad_words.fr > synonym_dictionary: ${common_dir}/synonyms.fr > # ----- fin de francisation ----- > > # mode: text > # eval: (if (eq window-system 'x) (progn (setq font-lock-keywords (list > '("^#.*" . font-lock-keyword-face) '("^[a-zA-Z][^ :]+" . > font-lock-function-name-face) '("[+$]*:" . font-lock-comment-face) )) > (font-lock-mode))) > # end: > > Many thanks for responses. > > Alain DESEINE. > > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld http://www.vasoftware.com > _______________________________________________ > htdig-general mailing list <[EMAIL PROTECTED]> > To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html > >
------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

