Re: [Dspace-tech] searching, PDFs, HTML and XML

Andrew Marlow Fri, 12 Dec 2008 15:58:42 -0800

On Fri, Dec 12, 2008 at 3:31 PM, Shane Beers <sbe...@gmu.edu> wrote:

> Andrew:
> Performing OCR on a PDF document is, as far as I know, the most widely used
> method to search a PDF document.



I see. I didnt know that.


> Is there a specific reason you do not want the PDFs to be searchable?


I ***do*** want the docs to be searchable. It just so happens that they are
not. This is a situation I want to change. I am trying to find out how to
make it happen.

I use the commercial product ABBYY Finereader


It looks like this would be able to fit your needs. However, I would be of
the opinion that just performing OCR would be the most direct and stable
option.

The trouble is that I have a large number of non-searchable PDFs that exist
in a certain repo and I want to import them into DSpace. There are simply
too many to OCR them and I do not want to spend money on a proprietary
solution.

>
> Shane Beers
> Digital Repository Services Librarian
> George Mason University
> sbe...@gmu.edu
> http://mars.gmu.edu
> 703-993-3742


Many thanks for your response, everyone here is being very helpful :-)
-- 
Regards,

Andrew M.

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/

_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Re: [Dspace-tech] searching, PDFs, HTML and XML

Reply via email to