Hi Sebastian,

Check out Apache Tika [1].
Provide the document stream to Tika and Tika should be able to give you all kinds of information about the content, including the text.


- Florian


[1] http://tika.apache.org/1.0/parser.html


Dear all,

we are trying to mine documents, that we retrieve via CMIS.

Whats the best way, to get the fulltext (as String) out of a Document
object?

best regards and thanks

Sebastian

Reply via email to