Hi, Robin

In my work, i have a lot of query log which produced by search engine and we
use hadoop as our tool to analyse those data. Sometimes, i'd like to some
data mining job such as clustering the similary queries, or classify them.
At first time, i think the mahout maybe another option for me to do data
mining job (as you know, the weka is my favorable data mining tool). But, as
i try to integrate mahout into my project, i find two major obstacles to
prevent me moving on further:

First, in my company, The hadoop with 0.19 is provided as platform for us to
do daily jobs. As we know, Mahout is dependent the hadoop with 0.2 or above.
This prevent me from benefiting from the functions which provided by mahout.

Secondly, the input data should be indexed by Lucene firstly( right or
wrong? ), then be imported by the Mahout. It confuse me very much, because
there are so many data stored by HDFS. In order to use the Mahout, i have to
check out all the data firstly ,and indexed by Lucene, and so on. It is
unbelievable for me.

So, i haven't use the mahout in my daily work. However, i always give my
attendtion to the Mahout, maybe someday i benefit from it.

What about other one's idea?

On Wed, Feb 10, 2010 at 6:19 PM, Robin Anil <[email protected]> wrote:

> Hi Mahouters
>      I am trying to find out how you are using Mahout for your work or
> project, or which among the algorithms in Mahout are more important for you
> to do that work. And finally what do you expect to see in Mahout(A kind of
> a
> wish list). It wont take much of your time. Please reply with this details.
>  It will help a great deal in figuring out where what we need to
> prioritize.
>
> Thanks
> Robin
>



-- 
http://anqiang1900.blog.163.com/

Reply via email to