Eugen,

There are several very closely related projects here (from the standpoint of
Mahout).  These include Hadoop (required for scaling several Mahout
programs), Lucene (often used to collect documents), Tika (useful in
conjunction with Lucene to extract and process text) and, as you note, UIMA.

While all of these projects have something to do with data mining and
unstructured text, the fairly simple dividing line is generally that if it
is to do with the data itself or the computing platform it is UIMA, Lucene
or Hadoop while if it is to do with the actual mathematics involved in the
data mining, it will be Mahout doing the work.

As Isabel says, there is little explicit glue code available but integrating
software from these projects is not typically very difficult.  There is a
huge variety of ways to do this, however, so it is hard to anticipate what
use cases are really important.  If you have a use case, please talk about
it.

On Sun, Jun 13, 2010 at 6:20 AM, Eugen Paraschiv <[email protected]>wrote:

> Hi, I'm starting to use Mahout for some text analysis work, and I was
> looking at the multitude of Apache projects that are out there. I have a
> question regarding the relation between Mahout and Apache UIMA, another
> project that seems to be dealing with machine learning and data mining.
> There may not be any explicit relation, none that I could find anyway, and
> I
> don't know if Mahout addresses or will ever address the topic of analysis
> and mining of unstructured content, or if it's outside the scope of the
> project. So, is there this a direction Mahout will evolve towards in the
> future? Thanks. Eugen.
>

Reply via email to