On Mon, 26 Jan 2009 14:06:23 -0800, Gary Kline <kl...@thought.org> wrote:
>       So what kind of moron is going to photograph pages --or maybe just
>       get-screenshot-of-this-page" and upload it? 

The PDF serves as a container for pictural images in this context.
Another idea would be to have separate image files, one file per
page, that you could view at with your favourite image viewer.

The advantage of the PDF container is that you can easily print
a bunch of pages (or, a book).



>  Or a Real question:
>       I read an online pdf of "The Art of War" from the 1880's [?], and
>       it was in an old-English or olden-Deutsch type font.  In PDF.  i
>       have other p.d. texts in pdf and am wondering in there is some
>       sort of scanner than can take a book-length script and create a
>       pdf file.  Anybody know?  

It's very complicated to handle old fonts using OCR techniques.
It's even quite complicated with today's standard fonts. Allthough
there are (usually expensive) OCR programs with good algorithms,
most documents need some work afterwards. It's not only about
correcting mis-recognized characters, you have to handle hyphenation
and paragraph typesetting as well.

I know that there are scanners that can process a bunch op paper
(sheets of paper) through an automatic feeder, then scan them and
finally have a PDF file ready for FTP download. But there's no
OCR involved, of course.


> I got a bunch of ^L bytes and nothing
>       else. 

The Ctrl-L (^L) is the page break character (FF = form feed). The
rest of the file then contains images that are not transformable
into characters.



> Now I'm looking at the file with od -c and, yup, it's and
>       image. The parts inbetween pages are in ASCII.  Do you know what
>       "MediaBox" is?

An image container maybe? So every page contains of a "MediaBox"
container holding one image.



>       At least the web article was not an image! 

Don't mind, I know "important" web pages where the text content 
actually IS an image, and of course theres no alt= or longdesc=
parameter because they're for weenies. :-)





-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to