On Tue, Dec 02, 2008 at 02:07:30AM +0100, Roland Smith wrote: > On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote: > > pdftotext fail on the large [32MB] file I've got. Is there any > > other way I can translate this huge textfile to ascii or html or > > text? > > Please define "fail" in this context? I've used pdftotxt on documents > exceeding 40MB. However there are of course things that don't work; > > 1) Some PDFs are just wrappers around JPEG images. In this case there is > no text for pdftotext to convert => epic fail. > > 2) If the text contains ligatures etc. you should use the proper > encoding that contains such characters (e.g. '-enc UTF-8') or you will > loose them. > > 3) Things like equations will not render well, if at all. This also > depends on the encoding.
It probably was a pdf wrapped around a jpeg. I was able to to another pdf to plaintext in a flash. (*sigh*) it wasn't a total waste of time because I found the entire text transfered to buugy ASCII somewhere [[ thanks to some prof ]]. So, if I ever want to run aspell against a 900-page file, at least I have that option! gary > > Roland > -- > R.F.Smith http://www.xs4all.nl/~rsmith/ > [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] > pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) -- Gary Kline [EMAIL PROTECTED] http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org Flash: The alpha release of Jottings is available: http://jottings.thought.org/index.php _______________________________________________ email@example.com mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"