An example of specialized knowledge:

Last Friday, a colleague showed me how he was using a data mining program to cluster 
over 1000 genes using 5 variables.   After clustering, he used the program to generate 
a pretty, spinnable 3-D plot of his data on 3 of the original variables.  It had 
color-coded clusters; and  one could also click on a plotted point and its id # and 
variable values would pop up. 

Some problems:
  
1)  Four of the variables were measured on a scale of 0-2, the 5th was on a scale of 
0-107.  He had no idea that with the distance measure he was using (Euclidean) that 
his clustering could be dominated by that 5th variable.

2) He chose a final cluster solution of two clusters simply because the program 
suggested that was the best solution (not indicating why).   But he was using k-means 
clustering, and was setting his initial estimate of number of clusters to 2.

3) He clearly had some outliers in his data set that were being masked.

4) He didn't realize that a different choice of 3 variables for plotting could result 
in a very different picture of his data.

5) He had chosen his plotting symbol to be large enough that, when points had similar 
coordinates, some points were hidden.


I pointed out some of these issues, we played with the data and the output, and it was 
a learning experience for both of us: he gained some knowledge of stats and I got to 
see some of the advantages/disadvantages of a data mining program.  I suppose the 
programs can be useful tools in the right hands; this comes from someone who, as a 
kid, didn't know that a hatchet was not the preferred tool for chopping ice off a roof.

rick



--- "Donald F. Burrill" wrote:
Thanks, Ellen.  Evocative quote, isn't it?  It's that "without requiring 
*any* (!) specialized knowledge" that will be the dangerous part, if read 
too literally by the naive.  
        Interesting that you could get to Lim's URL at all.  When _I _ 
tried it, several days ago, the system seemed to be trying to tell me that 
the  /forums  part of the URL wasn't accessible.  But perhaps the problem 
was only temporary.
                                -- Don.

On Sun, 30 Apr 2000, Ellen Hertz wrote:

> I looked up one and copied it:
> 
>       "For the first time, thanks to the increased power of computers, 
> new methods replace the skill of the statistical artisan with  
> massive-computational methods, obtaining equal or better results in far 
> less time without requiring any specialised knowledge."
> 
> In all fairness, I haven't read the whole paper and if he is referring 
> purely to computations such as generating maximum likelihood estimates 
> or inverting matrices, he is quite right that computers beat pencils.  
> If he means to just run programs without knowing what they mean 

... "untouched by the human mind", as Heidi Kass used to put it ...

> and generate GIGO, that certainly is dangerous.
                                                        Ayuh.  -- DFB.
> Ellen Hertz

 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
--- end of quote ---


===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to