[jira] [Commented] (MAHOUT-145) PartialData mapreduce Random Forests

JIRA Thu, 28 Feb 2013 12:39:14 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589902#comment-13589902
 ]


Sara Del Río García commented on MAHOUT-145:
--------------------------------------------

Hello Deneche A. Hakim:

I'm testing the Random Forest Partial version in the version of Hadoop: Hadoop 
2.0.0-cdh4.1.1

I'm trying to modify the algorithm, all I do is add more information to the 
leaves of the tree. Currently containing the label and I want to add another 
label more:

@Override
  public void readFields(DataInput in) throws IOException {     
        label = in.readDouble();
        leafWeight = in.readDouble();
  }
  
  @Override
  protected void writeNode(DataOutput out) throws IOException {
        out.writeDouble(label);
        out.writeDouble(leafWeight);
  }


And I get the following error:

13/02/27 06:53:27 INFO mapreduce.BuildForest: Partial Mapred implementation
13/02/27 06:53:27 INFO mapreduce.BuildForest: Building the forest...
13/02/27 06:53:27 INFO mapreduce.BuildForest: Weights Estimation: IR
13/02/27 06:53:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/02/27 06:53:39 INFO input.FileInputFormat: Total input paths to process : 1
13/02/27 06:53:39 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
13/02/27 06:53:39 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/27 06:53:39 INFO mapred.JobClient: Running job: job_201302270205_0013
13/02/27 06:53:40 INFO mapred.JobClient:  map 0% reduce 0%
13/02/27 06:54:18 INFO mapred.JobClient:  map 20% reduce 0%
13/02/27 06:54:42 INFO mapred.JobClient:  map 40% reduce 0%
13/02/27 06:55:03 INFO mapred.JobClient:  map 60% reduce 0%
13/02/27 06:55:26 INFO mapred.JobClient:  map 70% reduce 0%
13/02/27 06:55:27 INFO mapred.JobClient:  map 80% reduce 0%
13/02/27 06:55:49 INFO mapred.JobClient:  map 100% reduce 0%
13/02/27 06:56:04 INFO mapred.JobClient: Job complete: job_201302270205_0013
13/02/27 06:56:04 INFO mapred.JobClient: Counters: 24
13/02/27 06:56:04 INFO mapred.JobClient:   File System Counters
13/02/27 06:56:04 INFO mapred.JobClient:     FILE: Number of bytes read=0
13/02/27 06:56:04 INFO mapred.JobClient:     FILE: Number of bytes 
written=1828230
13/02/27 06:56:04 INFO mapred.JobClient:     FILE: Number of read operations=0
13/02/27 06:56:04 INFO mapred.JobClient:     FILE: Number of large read 
operations=0
13/02/27 06:56:04 INFO mapred.JobClient:     FILE: Number of write operations=0
13/02/27 06:56:04 INFO mapred.JobClient:     HDFS: Number of bytes read=1381649
13/02/27 06:56:04 INFO mapred.JobClient:     HDFS: Number of bytes written=1680
13/02/27 06:56:04 INFO mapred.JobClient:     HDFS: Number of read operations=30
13/02/27 06:56:04 INFO mapred.JobClient:     HDFS: Number of large read 
operations=0
13/02/27 06:56:04 INFO mapred.JobClient:     HDFS: Number of write operations=10
13/02/27 06:56:04 INFO mapred.JobClient:   Job Counters 
13/02/27 06:56:04 INFO mapred.JobClient:     Launched map tasks=10
13/02/27 06:56:04 INFO mapred.JobClient:     Data-local map tasks=10
13/02/27 06:56:04 INFO mapred.JobClient:     Total time spent by all maps in 
occupied slots (ms)=254707
13/02/27 06:56:04 INFO mapred.JobClient:     Total time spent by all reduces in 
occupied slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient:   Map-Reduce Framework
13/02/27 06:56:04 INFO mapred.JobClient:     Map input records=20
13/02/27 06:56:04 INFO mapred.JobClient:     Map output records=10
13/02/27 06:56:04 INFO mapred.JobClient:     Input split bytes=1540
13/02/27 06:56:04 INFO mapred.JobClient:     Spilled Records=0
13/02/27 06:56:04 INFO mapred.JobClient:     CPU time spent (ms)=12070
13/02/27 06:56:04 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=949579776
13/02/27 06:56:04 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=8412340224
13/02/27 06:56:04 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=478412800
READ 
nodetype: 0
Exception in thread "main" java.lang.IllegalStateException: java.io.EOFException
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:104)
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:129)
        at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:96)
        at 
org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:312)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:246)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:200)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:270)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at java.io.DataInputStream.readDouble(DataInputStream.java:451)
        at org.apache.mahout.classifier.df.node.Leaf.readFields(Leaf.java:136)
        at org.apache.mahout.classifier.df.node.Node.read(Node.java:85)
        at 
org.apache.mahout.classifier.df.mapreduce.MapredOutput.readFields(MapredOutput.java:64)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2114)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2242)
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95)
        ... 10 more

What's the problem?
You can try to write something more in the leaves of the tree? Anything.

Thank you very much.

Best regards,

Sara
                
> PartialData mapreduce Random Forests
> ------------------------------------
>
>                 Key: MAHOUT-145
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-145
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Classification
>    Affects Versions: 0.2
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: partial_August_10.patch, partial_August_13.patch, 
> partial_August_15.patch, partial_August_17.patch, partial_August_19.patch, 
> partial_August_24.patch, partial_August_27.patch, partial_August_2.patch, 
> partial_August_31.patch, partial_August_9.patch, partial_Sep_15.patch, 
> partial_Sep_30.patch
>
>
> This implementation is based on a suggestion by Ted:
> "modify the original algorithm to build multiple trees for different portions 
> of the data. That loses some of the solidity of the original method, but 
> could actually do better if the splits exposed non-stationary behavior."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-145) PartialData mapreduce Random Forests

Reply via email to