Re: reading text out of ps/pdf

Christopher Jones Sat, 13 Jan 2001 10:33:36 -0800

I have that tool. But some pdf or ps files consist not of coded text but a
bitmapped image. For instance, pdf and ps files which I download from journal
databases are scanned images of journal pages. ps2ascii and pdftotext will not
extract text from these files, since there is no ascii content to extract. 

Anyway, that is the best explanation I have been able to figure, by examining
the contents of pdf and ps files and seeing that the post-preamble stuff is
sometimes text, sometimes not, and seeing that ps2ascii poops out on the
latter, though not on the former.

So my question is: is there any software out there which attempts to look at
bitmaps and guess what the ascii would be-- something like those programs which
read books through a scanner and try to match font characters to the image. And
I say this question is a reach, because I know that those programs which I have
heard about are either very expensive or very innacurate. 

Thanks very much for the response.

On Sun, 14 Jan 2001, you wrote:
> yes
> 
> there is a tool called ps2ascii, it extracts plain texts form *.ps files
>

Re: reading text out of ps/pdf

Reply via email to