[Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Andrew Marlow
Hello, Now that I have loaded a few PDFs into my DSpace repo, I am wondering how to enable full text searching. The PDFs happen to be in a form that means they cannot be searched directly. So when I search in DSpace I get no results returned (unless the text also appears in the abstract I entered

Re: [Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Shane Beers
Andrew: Performing OCR on a PDF document is, as far as I know, the most widely used method to search a PDF document. Is there a specific reason you do not want the PDFs to be searchable? Even the archival standard of PDF/A (archival PDF) allows for OCR. I use the commercial product ABBYY

Re: [Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Mark H. Wood
On Fri, Dec 12, 2008 at 08:44:49AM +, Andrew Marlow wrote: Now that I have loaded a few PDFs into my DSpace repo, I am wondering how to enable full text searching. The PDFs happen to be in a form that means they cannot be searched directly. So when I search in DSpace I get no results Do

Re: [Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS]
-4001 Pager: (757) 988-2547 Email: susan.m.thorn...@nasa.gov -Original Message- From: Shane Beers [mailto:sbe...@gmu.edu] Sent: Friday, December 12, 2008 10:31 AM To: Andrew Marlow Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] searching, PDFs, HTML and XML Andrew

Re: [Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Brian Freels-Stendel
...@gmu.edu] Sent: Friday, December 12, 2008 10:31 AM To: Andrew Marlow Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] searching, PDFs, HTML and XML Andrew: Performing OCR on a PDF document is, as far as I know, the most widely used method to search a PDF document

Re: [Dspace-tech] searching, PDFs, HTML and XML

2008-12-12 Thread Andrew Marlow
On Fri, Dec 12, 2008 at 3:31 PM, Shane Beers sbe...@gmu.edu wrote: Andrew: Performing OCR on a PDF document is, as far as I know, the most widely used method to search a PDF document. I see. I didnt know that. Is there a specific reason you do not want the PDFs to be searchable? I