On Tue, Dec 02, 2008 at 02:07:30AM +0100, Roland Smith wrote:
> On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
> >     pdftotext fail on the large [32MB] file I've got.  Is there any
> >     other way I can translate this huge textfile to ascii or html or
> >     text?
> Please define "fail" in this context? I've used pdftotxt on documents
> exceeding 40MB. However there are of course things that don't work;
> 1) Some PDFs are just wrappers around JPEG images. In this case there is
> no text for pdftotext to convert => epic fail.
> 2) If the text contains ligatures etc. you should use the proper
> encoding that contains such characters (e.g. '-enc UTF-8') or you will
> loose them.
> 3) Things like equations will not render well, if at all. This also
> depends on the encoding.

        It probably was a pdf wrapped around a jpeg.   I was able to to
        another pdf to plaintext in a flash.   (*sigh*)  it wasn't a total
        waste of time because I found the entire text transfered to  buugy
        ASCII somewhere [[ thanks to some prof ]].  So, if I ever want to run 
        against a 900-page file, at least I have that option!


> Roland
> -- 
> R.F.Smith                                   http://www.xs4all.nl/~rsmith/
> [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
> pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)

 Gary Kline  [EMAIL PROTECTED]  http://www.thought.org  Public Service Unix
        http://jottings.thought.org   http://transfinite.thought.org
 Flash: The alpha release of Jottings is available: 

freebsd-questions@freebsd.org mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to