Hi Daniel,

   we do it in a bit different way - we have also a lof of OCR documents. 
The PDF format allows you to create two layer PDF - the first layer 
consists of the scanned text as a image (and is displayed to users) and 
the bellow layer can consist of the OCR text from the image. This solution 
has a lot of advantages - at least, the pdf.txt files are created by the 
DSpace and you do not to make any changes by hand.

   I think we use for creating such PDFs two tools - FineReader and 
InftyReader. However, this is not my part of the project, so I am not sure 
if both are neccessary and what is the worflow. If you are interested in 
more details let me know and I will redirect you to the right persons 
:-).

   Have a nice day

   Vlastik

----------------------------------------------------------------------------
Vlastimil Krejčíř
Library and Information Centre, Institute of Computer Science
Masaryk University, Brno, Czech Republic
Email: krejcir (at) ics (dot) muni (dot) cz
Phone: +420 549 49 3872
ICQ: 163963217
Jabber: [email protected]
----------------------------------------------------------------------------

On Thu, 7 Mar 2013, Daniel Sifton wrote:

> 
> Hi folks,
> 
>  
> 
> We’ve uploaded a limited amount of OCR pdf documents. Were we to edit the
> OCR bitstream (.pdf.text) does anyone have any advice on how to go about
> getting out the bitstream and then getting it back in? Or perhaps I’m coming
> at this from the wrong angle?
> 
>  
> 
>  
> 
> Thanks,
> 
>  
> 
>  
> 
> Dan
> 
> 
>

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to