On Mon, 22 Feb 1999, Gilles Detillieux wrote:

> Date: Mon, 22 Feb 1999 13:23:16 -0600 (CST)
> From: Gilles Detillieux <[EMAIL PROTECTED]>
> To: "Joe R. Jah" <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig] pdf parser: No error;)  Search: No results;(
> 
> According to Joe R. Jah:
> > I run ht/Dig 3.1.1 including the parser patch on a BSDI 4.0 box.  In my
> > htdig.config I have:
> > 
> >     pdf_parser:             /usr/contrib/bin/pdftops
> > 
> > rundig does not complain about any pdf files except two large files, for
> > which I plan to increase: 
> > 
> >     max_head_length:        50000
> > 
> > to some very high number; however, search does not find any words in pdf
> > files; they do not show up in any results.
> > 
> > Has anyone successfully used pdftops to dig pdf files?
> > 
> > I appreciate any pointers.
> 
> The code in htdig/PDF.cc expects the PostScript output from the pdf
> parser to be in a very specific format -- the one that acroread outputs.
> The latest version of xpdf is supposed to output PostScript in a
> compatible format, from what I've read on this list, but I haven't seen
> any mention of pdftops.  My guess, given the lack of results you reported,
> is that it's PostScript output is not compatible.  If it doesn't find
> the tags it expects in the PostScript, it won't give any error messages.
> It'll just silently ignore what's there as it scans for the beginning
> of text block marker.

pdftops is part of xpdf package; I just downloaded, compiled, installed
the latest version of xpdf, and randig.  Still no search results;*(

> As for dealing with large files, it's max_doc_size you need to adjust.
> By default, it's 100000, so you need to increase it if dealing with files
> larger than 100K.  The max_head_length attribute determines how much of
> the document text will be stored for excerpts, but this is done on the
> processed text.

Thanks; I stand corrected.  I just added that line to my config file and
increased it to 650000 to cover all the existing pdf files in my search
path.

As a side note, I think it would be very helpful to have the sample config
file have the entire options present as default, perhaps with a short
comment.

Joe

     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]



------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to