[ https://issues.apache.org/jira/browse/MAHOUT-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844597#action_12844597 ]
Chad Chen commented on MAHOUT-326: ---------------------------------- Hi Robin, The bug does seem to be very difficult to reproduce. I have run the same test many more times and have not seen the same problem again. ' Unfortunately I lost the original logging output to the console. Otherwise, I should be able tell whether or not the jc.monitorAndPrintJob call in the following method returned true : public static RunningJob runJob(JobConf job) throws IOException { JobClient jc = new JobClient(job); RunningJob rj = jc.submitJob(job); try { if (!jc.monitorAndPrintJob(job, rj)) { throw new IOException("Job failed!"); } } catch (InterruptedException ie) { Thread.currentThread().interrupt(); } return rj; } However, I have a little bit confusion about the following code block: private static boolean runIteration(..) { ... try { JobClient.runJob(conf); FileSystem fs = FileSystem.get(outPath.toUri(), conf); return isConverged(clustersOut, conf, fs); } catch (IOException e) { log.warn(e.toString(), e); return true; } } So if the call to JobClient.runJob throws an IOException, the reunInteration will return true? In this case, the runClustering method may encounter the same problem I saw (i.e, the cluster output file was not ready). Is my understanding correct? Thanks. > a possible bug with the isConverged() method in KMeansDriver.java > ----------------------------------------------------------------- > > Key: MAHOUT-326 > URL: https://issues.apache.org/jira/browse/MAHOUT-326 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.2 > Reporter: Chad Chen > Attachments: mahout_bug.png > > > In one of my today's test runs using the clustering example from the book > "Mahout in Action", I noticed the following exception thrown by > KMeansClusterMapper: > ---------------------------- > java.lang.RuntimeException: Error in configuring object at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at > org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 5 more Caused by: java.lang.RuntimeException: Error in configuring object > at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 > more Caused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) at > *** > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 13 more Caused by: java.lang.NullPointerException: Cluster is empty!!! at > *** > org.apache.mahout.clustering.kmeans.KMeansClusterMapper.configure(KMeansClusterMapper.java:63) > --------------------------- > which says that the runClustering method didn't see the cluster ouput. The > same map task did finally succeed after a few failed attempts. > After looking into KMeansDirver.java, I think may be a bug in the isConverged > method. Basically, this method doesn't wait for the cluster output file to be > fully populated. If the part-* file doesn't exist yet or has not been fully > written, then this method can return true prematurally. I am not sure if this > is a bug of hadoop itself because it may report successful job before the > mapred output file is fully written. Meanwhile, a possible way to fix this > problem is to force the isConverged method to wait for the existence of the > cluster output file and make sure the file contains the 'converged' values > for all the clusters. > Please note, I saw this problem only once in many test runs I had so far. It > may be a little bit difficult to reproduce. If you need any further > information, please let me know. > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.