https://issues.apache.org/jira/browse/TIKA-433 might be of interest to those people looking to extract text from Office/PDF, etc. and then convert into Mahout vectors.
-Grant
https://issues.apache.org/jira/browse/TIKA-433 might be of interest to those people looking to extract text from Office/PDF, etc. and then convert into Mahout vectors.
-Grant