a possible bug with the isConverged() method in KMeansDriver.java
-----------------------------------------------------------------

                 Key: MAHOUT-326
                 URL: https://issues.apache.org/jira/browse/MAHOUT-326
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.2
            Reporter: Chad Chen


In one of my today's test runs using the clustering example from the book 
"Mahout in Action", I noticed the following exception thrown by  
KMeansClusterMapper:

----------------------------
java.lang.RuntimeException: Error in configuring object at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 
5 more Caused by: java.lang.RuntimeException: Error in configuring object at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at 
org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more 
Caused by: java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at 

***

org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 
13 more Caused by: java.lang.NullPointerException: Cluster is empty!!! at 

***

org.apache.mahout.clustering.kmeans.KMeansClusterMapper.configure(KMeansClusterMapper.java:63)
---------------------------

which says that the runClustering method didn't see the cluster ouput.  The 
same map task did finally succeed after a few failed attempts.

After looking into KMeansDirver.java, I think may be a bug in the isConverged 
method. Basically, this method doesn't wait for the cluster output file to be 
fully populated. If the part-* file doesn't exist yet or has not been fully 
written, then this method can return true prematurally. I am not sure if this 
is a bug of hadoop itself because it may report successful job before the 
mapred output file is fully written. Meanwhile, a possible way to fix this 
problem is to force the isConverged method to wait for the existence of the 
cluster output file and make sure the file contains the 'converged' values for 
all the clusters.

Please note, I saw this problem only once in many test runs I had so far. It 
may be a little bit difficult to reproduce. If you need any further 
information, please let me know.

Thanks.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to