[jira] [Updated] (MAHOUT-1128) MAHOUT-999 issue still actual

Andrey Davydov (JIRA) Mon, 17 Dec 2012 07:02:13 -0800

     [ 
https://issues.apache.org/jira/browse/MAHOUT-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrey Davydov updated MAHOUT-1128:
-----------------------------------

    Environment: 
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with 
Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "client" machine and it submits tasks to 
cluster.



  was:
I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 virtual computers with 
Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "clien" machine and it submit tasks to 
cluster.



    
>  MAHOUT-999 issue still actual
> ------------------------------
>
>                 Key: MAHOUT-1128
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1128
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>         Environment: I work on Hadoop 1.0.3 cluster deployed on Amazon EC2 
> virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
> I run my application from separated "client" machine and it submits tasks to 
> cluster.
>            Reporter: Andrey Davydov
>
> I'm sorry my english is not well and I'm newbie with Mahout. But it seems 
> that MAHOUT-999 issue still actual.
> I use mahout-core 0.7 loaded from maven-central and I've got the same fail. 
> I've investigate sources and found following in the 
> org.apache.mahout.clustering.classify.ClusterClassifier class:
>   public void writeToSeqFiles(Path path) throws IOException {
>     writePolicy(policy, path);
>     Configuration config = new Configuration();
>     FileSystem fs = FileSystem.get(path.toUri(), config);
>     SequenceFile.Writer writer = null;
>     ClusterWritable cw = new ClusterWritable();
>     for (int i = 0; i < models.size(); i++) {
> ...
>       } finally {
>         Closeables.closeQuietly(writer);
>       }
>     }
>   }
>   
>   public void readFromSeqFiles(Configuration conf, Path path) throws 
> IOException {
>     Configuration config = new Configuration();
>     List<Cluster> clusters = Lists.newArrayList();
>     for (ClusterWritable cw : new 
> SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
>         PathFilters.logsCRCFilter(), config)) {
> ...
>     }
>     this.models = clusters;
>     modelClass = models.get(0).getClass().getName();
>     this.policy = readPolicy(path);
>   }
> Both methods use new default Configuration and they try to work with local 
> file system. I.e. KMeansDriver wrote initial clusters to local file system of 
> the "client" system and CIMapper try to read it from cluster node local file 
> system.
> It seems that current implementation can work only pseudo-distributed hadoop 
> system. I think that ClusterClassifier should store intermediate results in 
> the HDFS using Configuration passed by api from user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-1128) MAHOUT-999 issue still actual

Reply via email to