https://issues.apache.org/jira/browse/TIKA-433 might be of interest to those 
people looking to extract text from Office/PDF, etc. and then convert into 
Mahout vectors.

-Grant

Reply via email to