How to get Content from PDF document.

Jonathan Muniz Thu, 24 Apr 2014 07:22:48 -0700

Hi all.
I have PDF documents that I wanted to extract the contents, as if to
present a summary. And also to show the area containing the text of the
search made by the User.


To do this I'm having to create copies of documents in plain text. For when
I get through the content:
// Load documents under the target folder

ItemIterable<QueryResult> documentsResultSet = sessionCopia.query(
"SELECT * from cmis:document where in_folder('" + parentFolder+ "') and
cmis:name ='" + fileName + "'", false).getPage();

So i get the id

CmisObject object = sessionCopie
.getObject(documentSearchResult.getId());
Document document = (Document) object;

//Here i get the ALL the stream and transform to string where looking for
in the plain text the //searchParam. Using JAVA api.

return
TransformAndExtractInputStreamForStringCmis.getInputStreamToText(document
.getContentStream().getStream(), searchParam);

Could someone point me to a better way of doing it I thought I could do
this search within the content document and extract using something already
indexed.
Finding the indexed document was easy.
But then find the contents inside it and extract it using cmis api would
look like?

Thank you.

How to get Content from PDF document.

Reply via email to