Re: Taste Vs Weka

Grant Ingersoll Wed, 27 Aug 2008 07:33:51 -0700


On Aug 27, 2008, at 8:33 AM, Richard Tomsett wrote:

There's quite a good description of WEKA and its capabilities on thecourse page for a module I took this year: http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html
It's more a general suite of data-mining tools rather than a tool toaddress a specific task like Taste (plus it's obviously notimplemented for parallel processing which could be problematic forscaling up). From the link above:
  * *Advantages*: The obvious advantage of a package like Weka is that
    *a whole range of data preparation, feature selection and data
    mining algorithms are integrated*. This means that only one data
    format is needed, and trying out and comparing different
    approaches becomes really easy. The package also comes with *a
    GUI*, which should make it easier to use.

Yeah, it would be good for Mahout to adopt an approach for eithertranslating from ARFF to our format, or just use ARFF or whatever elseWeka does, but I don't want it to preclude us from innovating where weneed to innovate.



  * *Disadvantages*: Probably the most important disadvantage of data
    mining suites like this is that *they do not implement the newest
    techniques*. For example the MLP implemented has a very basic
    training algorithm (backprop with momentum), and the SVM only uses
    polynomial kernels, and does not support numeric estimation. ...
    *A third possible problem is scaling*. For difficult tasks on
    large datasets, the running time can become quite long, and java
    sometimes gives an OutOfMemory error. This problem can be reduced
    by using the '-mx/x/' option when calling java, where /x/ is
    memory size (eg '50m'). For large datasets it will always be
    necessary to reduce the size to be able to work within reasonable
    time limits. A fourth problem is that *the GUI does not implement
    all the possible options*. Things that could be very useful, like
    scoring of a test set, are not provided in the GUI, but can be
    called from the command line interface. So sometimes it will be
    necessary to switch between GUI and command line. Finally, *the
    data preparation and visualisation techniques offered might not be
    enough*. Most of them are very useful, but I think in most data
    mining tasks you will need more to get to know the data well and
    to get it in the right format.

From a Mahout view, we are very much aiming at addressing the scalingissue. As for the GUI, I think that will always be a "contrib" forMahout, if one ever exists. My personal goal for Mahout is to keep itlean and easily usable in a wide variety of applications. Just asLucene has made search a commodity in many ways, I think Mahout couldenable ML to be a commodity in 5 years.

Also, a glaring difference between the two is Weka is GPL. I'll leaveit to you to read all the discussions on ASL vs. GPL and do not wantto start that discussion here, as there is no point.

Last, I imagine we will all coexist nicely. Weka will be useful formany tasks, and Mahout will be useful for many tasks and there willcertainly be overlap.

Re: Taste Vs Weka

Reply via email to