[ 
https://issues.apache.org/jira/browse/MAHOUT-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399274#comment-13399274
 ] 

Sean Owen commented on MAHOUT-985:
----------------------------------

I want to commit this. It seems to fail in the dense vector case. I'm guessing 
that, in that case, the 'split' items need to be trimmed, and need to be 
checked for "?", right?
                
> MapBackedArffModel Unable To Parse ARFF Files Containing Instance Weights
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-985
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-985
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.5
>            Reporter: Dave Kor
>            Priority: Minor
>              Labels: Arff
>         Attachments: MAHOUT-985.patch
>
>
> When parsing an Arff file that contain instance-specific weights, 
> MapBackedArffModel throws the following NPE exception. While I have only 
> tested this in 0.5, I suspect this bug also occur in 0.6
> Exception in thread "main" java.lang.NullPointerException
>         at 
> org.apache.mahout.utils.vectors.arff.MapBackedARFFModel.getValue(MapBackedARFFModel.java:87)
>         at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:75)
>         at 
> org.apache.mahout.utils.vectors.arff.ARFFIterator.computeNext(ARFFIterator.java:30)
>         at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at 
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
>         at 
> org.apache.mahout.utils.vectors.arff.Driver.writeFile(Driver.java:159)
>         at org.apache.mahout.utils.vectors.arff.Driver.main(Driver.java:127)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> The code works properly when all instance weights are set to the default 
> value of 1. However when any instance has a non-default weight, such as in 
> the sample Arff file below, the NPE occurs when MapBackedArffModel attempts 
> to parse line 8. 
> -----
> @relation 'Test Mahout'
> @attribute Attr0 numeric
> @attribute Label {True,False}
> @data
> 0,False
> 1,True,{2}
> -----
> The reason is that in Weka, all data instances are assumed to have a default 
> weight of 1 and this default weight is not saved in the Arff file. However 
> when a data instance DOES NOT have the default weight of 1, the non-default 
> instance weight is appended at the end of the line surrounded by curly 
> braces. When MapBackedArffModel.getValue method tries to parse this weight as 
> an attribute, typeMap.get(idx) returns a null ARFFtype as there is no such 
> attribute, which results in an NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to