ARFFVectorIterable/MapBackedArffModel doesn't handle question mark '?', other 
ARFF issues
-----------------------------------------------------------------------------------------

                 Key: MAHOUT-952
                 URL: https://issues.apache.org/jira/browse/MAHOUT-952
             Project: Mahout
          Issue Type: Bug
          Components: Integration
    Affects Versions: 0.6
         Environment: Latest SVN on ubuntu
            Reporter: Stuart Smith
            Priority: Minor


Whatever is parsing the ARFF file for the ARFFVectorIterable (As far as I can 
tell, it's the class itself) doesn't handle '?' as a marker for unknown value. 
See: http://www.cs.waikato.ac.nz/~ml/weka/arff.html  

I just started looking at Mahout classifiers this week, so I'm not sure how to 
handle this yet. If I figure it out, I'll post a patch, but until then, 
guidance would be helpful!

Off topic, but I'm also having some issue were the labels populated in the map 
apparently aren't coming from the Attribute Header at the top of the file. I 
have very sparse vectors (1800+ attributes, only a few hundred set for any 
given before).. and I keep getting IndexOutOfBounds or mismatched cardinality 
issues, depending on whether I use full ARFF or sparse ARFF. Either way, when I 
dump the Labels from getModel(), it doesn't have them all.. even if I parse the 
ARFF myself, and call setLabel() (Apparently just throws that away). Looks like 
the DenseVectors keep thinking the cardinality is 534, when it should be 
1800+.... when I know more, I'll create a new issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to