On Feb 9, 2010, at 9:56 AM, Robin Anil wrote:

> Yeah that sounds ok. Do we have the pure content without html ?

No, but I was just thinking yesterday that a really nice enhancement to the 
Doc. Vectorizer would be to hook in Tika, such that one could M/R binary files 
into Mahout vectors.  Thoughts?  Tika integration should be pretty trivial.  I 
can likely help later in the week.

-Grant

Reply via email to