In the Data Mining world that is dominated by Computer Scientists, the
methodology behind the software packages sold/licensed in the market
is often proprietary. Take, for example, the classification and
regression trees software package CART(r). The basic idea behind
CART(r) is the algorithm proposed by Breiman, Friedman, Olshen, and
Stone (1984). However, there has been quite a few proprietary
improvement in CART(r) so that you can no longer know for sure what's
going on inside the software package. The same is true for C5.0/See5
(another classification trees software) that supersedes C4.5.

When dealing with proprietary methodology, it's (practically)
impossible to study the properties of the method
thoroughly. Personally, I feel uncomfortable using a method that can't
be evaluated objectively by fellow researchers. It may be OK if the
application has nothing to do with human experimentation (as in
Biostatistics). Since most (if not all) applications of Data Mining
are in commerce, the risk of using unproven methodology that hasn't
been extensively scrutinized may be acceptable.

Perhaps this joke is true after all: when a Statistician gets an idea,
she/he'll write and publish a paper while when a Computer Scientist
gets an idea, she/he'll form a company. :)

Comments?



--
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com



------------------------------------------------------------
Get paid to write review! http://recursive-partitioning.epinions.com




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to