[
https://issues.apache.org/jira/browse/MAHOUT-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001434#comment-14001434
]
Sergey commented on MAHOUT-1498:
--------------------------------
Great, nice to hear it.
Looks like I have the similar problem here. It appears during execution as
oozie java action. I would inverstigate it and create separate ticket if I find
the root cause of problem. ClusterClassificationDriver is much more difficult
to read than previously patched modules.
{code}
at
org.apache.mahout.clustering.classify.ClusterClassificationDriver.classifyClusterMR(ClusterClassificationDriver.java:276)
at
org.apache.mahout.clustering.classify.ClusterClassificationDriver.run(ClusterClassificationDriver.java:135)
at
org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:372)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:158)
at
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:117)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.mahout.clustering.canopy.CanopyDriver.main(CanopyDriver.java:64)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
{code}
> DistributedCache.setCacheFiles in DictionaryVectorizer overwrites jars pushed
> using oozie
> -----------------------------------------------------------------------------------------
>
> Key: MAHOUT-1498
> URL: https://issues.apache.org/jira/browse/MAHOUT-1498
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.7
> Environment: mahout-core-0.7-cdh4.4.0.jar
> Reporter: Sergey
> Assignee: Sebastian Schelter
> Labels: patch
> Fix For: 1.0
>
> Attachments: MAHOUT-1498.patch
>
>
> Hi, I get exception
> {code}
> <<< Invocation of Main class completed <<<
> Failing Oozie Launcher, Main class
> [org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles], main() threw
> exception, Job failed!
> java.lang.IllegalStateException: Job failed!
> at
> org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329)
> at
> org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199)
> at
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:271)
> {code}
> The root cause is:
> {code}
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247
> {code}
> Looks like it happens because of
> DictionaryVectorizer.makePartialVectors method.
> It has code:
> {code}
> DistributedCache.setCacheFiles(new URI[] {dictionaryFilePath.toUri()}, conf);
> {code}
> which overrides jars pushed with job by oozie:
> {code}
> public static void More ...setCacheFiles(URI[] files, Configuration conf) {
> String sfiles = StringUtils.uriToString(files);
> conf.set("mapred.cache.files", sfiles);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)