[ 
https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880200#comment-13880200
 ] 

Dmitriy Lyubimov commented on MAHOUT-1408:
------------------------------------------

I take it you are trying to use SSVD solver in some sort of embedded mode, not 
a pure Mahout CLI? 
Still though, i am not sure why you want wrestle control over map reduce from 
SSVD solver in individual MR steps? Additional jars will not get there (nor 
they are needed by SSVD jobs). Mahout architecture, in general,  and this 
pipeline in particular, does not assume you get to manipulate individual job 
settings. This pipeline's step legitimately expects to find the files in the 
cache that SSVD pipeline has put into it. 

I would like to place a burden on you to explain why you think SSVD pipeline 
should expect someone messing up its MR settings.

Assuming however your reasons are valid, this (BtJob mr) would not be the only 
MR case where cache is used in the SSVD pipeline and this patch will not be 
sufficient to do this throughout. 


> Distributed cache file matching bug while running SSVD in broadcast mode
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1408
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1408
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.8
>            Reporter: Angad Singh
>            Assignee: Dmitriy Lyubimov
>            Priority: Minor
>         Attachments: BtJob.java.patch
>
>
> The error is:
> java.lang.IllegalArgumentException: Unexpected file name, unable to deduce 
> partition 
> #:file:/data/d1/mapred/local/taskTracker/distcache/434503979705629827_-1822139941_1047712745/nn.red.ua2.inmobi.com/user/rmcuser/oozie-oozi/0034272-140120102756143-oozie-oozi-W/inmobi-ssvd_mahout--java/java-launcher.jar
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:154)
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDHelper$1.compare(SSVDHelper.java:1)
>       at java.util.Arrays.mergeSort(Arrays.java:1270)
>       at java.util.Arrays.mergeSort(Arrays.java:1281)
>       at java.util.Arrays.mergeSort(Arrays.java:1281)
>       at java.util.Arrays.sort(Arrays.java:1210)
>       at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.init(SequenceFileDirValueIterator.java:112)
>       at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:94)
>       at 
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:220)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>       at org.apache.hadoop.mapred.Child.main(Child.java:260)
> The bug is @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/BtJob.java,
>  near line 220.
> and  @ 
> https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/SSVDHelper.java
>  near line 144.
> SSVDHelper's PARTITION_COMPARATOR assumes all files in the distributed cache 
> will have a particular pattern whereas we have jar files in our distributed 
> cache which causes the above exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to