According to Robert Cerny:
> looks like xpdf problem?! When trying to run manually pdftotext I'm getting
> error: Error (1024): PDF version 1.3 -- xpdf supports version 1.2
> (continuing anyway)

No, it doesn't seem that way to me.  If you get the message above when
you run pdftotext manually, you should get the same message if you
were using it within an external parser in htdig.  The messages below
suggest htdig is using pdf_parser, not external_parsers, to parse PDFs.
Most likely, it's acroread that's catching the segmentation violation,
due probably to a corrupt or truncated PDF.

> > I'm  a newbie with htDig so excuse my question if the answer is stupid one.
> > I'm trying to index some pdf files created with distiller 4 and I'm getting
> > following errors:
> > http://www.dataline.cz/Other/amnestie.pdf: /tmp/htdig23334.pdf: Segmenta
> > tion Violation Caught.
> > PDF::parse: error running pdf_parser on
> > http://www.dataline.cz/Other/amnestie.pdf  size = 96982
> >
> > Any ideas?

Is the size reported above (96982 bytes) the actual size of your
amnestie.pdf file?  If it's smaller, then you've probably run into the
max_doc_size limit of 100000 bytes, so you should set this attribute to
a larger size in your htdig.conf (see http://www.htdig.org/FAQ.html#q5.2).
Also, if this is the case, and it didn't cut off at exactly 100000 bytes,
you must be running an older beta release of htdig, and not the latest
3.1.2 release, so you may want to consider upgrading.

On the other hand, if your PDF file is exactly the size reported, you
should look into why acroread is crashing on it.  Try running acroread
on it manually, both with and without the -toPostScript option, to see
what errors it comes up with.  If the file is indeed corrupt, you should
recreate it with distiller.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to