[
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734981#comment-14734981
]
Kai Hui edited comment on MAHOUT-1408 at 9/8/15 3:22 PM:
---------------------------------------------------------
I am also experiencing the same problem at this moment.. What I did is simiply
call the toolrunner.run(SSVDCli, args) within my class.. The reason to do that
is due to I cant install the mahout on the server due to the edition of the
maven and I have no sudo ...
And I received exactly the same error, which I guess is due to there exist tmp
files in distributed cash for follow-up pipeline in ssvd solver... However,
there exist other files, like jars in the distributed cash, and all these other
files cant pass thru the pattern checking, i.e., not following the specific
patter requried in SSVDHelper$1.compare(SSVDHelper.java:152). And the whole job
failed... I would suppose a file name filter before the pattern matcher would
do the job...
For the future reference, I wrote down my solution as follows and it worked for
me: add the jar files to the class path inside your java code with
addArchiveToClassPath instead of using -libjars in the command line of hadoop
jar ...:
FileSystem fs = FileSystem.get(conf);
String libdir = "the hdfs uri for your jar, like /user/<username>/libjars"
FileStatus[] status_list = fs.listStatus(new Path(libdir));
if (status_list != null) {
for (FileStatus status : status_list) {
String fname = status.getPath().getName();
if (fname.endsWith("jar")) {
Path path2add = new Path(libdir + "/" + fname);
DistributedCache.addArchiveToClassPath(path2add, conf, fs);
}
}
}
was (Author: lhyan792):
I am also experiencing the same problem at this moment.. What I did is simiply
call the toolrunner.run(SSVDCli, args) within my class.. The reason to do that
is due to I cant install the mahout on the server due to the edition of the
maven and I have no sudo ...
And I received exactly the same error, which I guess is due to there exist tmp
files in distributed cash for follow-up pipeline in ssvd solver... However,
there exist other files, like jars in the distributed cash, and all these other
files cant pass thru the pattern checking, i.e., not following the specific
patter of the file name {filename}-{p}-{number} requried in
SSVDHelper$1.compare(SSVDHelper.java:152). And the whole job failed... I would
suppose a file name filter before the pattern matcher would do the job...
For the future reference, I wrote down my solution as follows and it worked for
me: add the jar files to the class path inside your java code with
addArchiveToClassPath instead of using -libjars in the command line of hadoop
jar ...:
FileSystem fs = FileSystem.get(conf);
String libdir = "the hdfs uri for your jar, like /user/<username>/libjars"
FileStatus[] status_list = fs.listStatus(new Path(libdir));
if (status_list != null) {
for (FileStatus status : status_list) {
String fname = status.getPath().getName();
if (fname.endsWith("jar")) {
Path path2add = new Path(libdir + "/" + fname);
DistributedCache.addArchiveToClassPath(path2add, conf, fs);
}
}
}
> Distributed cache file matching bug while running SSVD in broadcast mode
> ------------------------------------------------------------------------
>
> Key: MAHOUT-1408
> URL: https://issues.apache.org/jira/browse/MAHOUT-1408
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Affects Versions: 0.8
> Reporter: Angad Singh
> Assignee: Dmitriy Lyubimov
> Priority: Minor
> Fix For: 0.10.0
>
> Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce
> partition
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
> at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
> at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
> at java.util.Arrays.mergeSort(Arrays.java:1270)
> at java.util.Arrays.mergeSort(Arrays.java:1281)
> at java.util.Arrays.mergeSort(Arrays.java:1281)
> at java.util.Arrays.sort(Arrays.java:1210)
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:94)
> at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
> near line 220.
> and @
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
> near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache
> will have a particular pattern whereas we have jar files in our distributed
> cache which causes the above exception.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)