Andrey Davydov created MAHOUT-1128:
--------------------------------------
Summary: MAHOUT-999 issue still actual
Key: MAHOUT-1128
URL: https://issues.apache.org/jira/browse/MAHOUT-1128
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.7
Environment: I work on Hadoop 1.0.3 cluster deployed on Amazon EC2
virtual computers with Ubuntu 11 and mahout-core.jar 0.7 from maven-central.
I run my application from separated "clien" machine and it submit tasks to
cluster.
Reporter: Andrey Davydov
I'm sorry my english is not well and I'm newbie with Mahout. But it seems that
MAHOUT-999 issue still actual.
I use mahout-core 0.7 loaded from maven-central and I've got the same fail.
I've investigate sources and found following in the
org.apache.mahout.clustering.classify.ClusterClassifier class:
public void writeToSeqFiles(Path path) throws IOException {
writePolicy(policy, path);
Configuration config = new Configuration();
FileSystem fs = FileSystem.get(path.toUri(), config);
SequenceFile.Writer writer = null;
ClusterWritable cw = new ClusterWritable();
for (int i = 0; i < models.size(); i++) {
...
} finally {
Closeables.closeQuietly(writer);
}
}
}
public void readFromSeqFiles(Configuration conf, Path path) throws
IOException {
Configuration config = new Configuration();
List<Cluster> clusters = Lists.newArrayList();
for (ClusterWritable cw : new
SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST,
PathFilters.logsCRCFilter(), config)) {
...
}
this.models = clusters;
modelClass = models.get(0).getClass().getName();
this.policy = readPolicy(path);
}
Both methods use new default Configuration and they try to work with local file
system. I.e. KMeansDriver wrote initial clusters to local file system of the
"client" system and CIMapper try to read it from cluster node local file system.
It seems that current implementation can work only pseudo-distributed hadoop
system. I think that ClusterClassifier should store intermediate results in the
HDFS using Configuration passed by api from user.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira