[ 
https://issues.apache.org/jira/browse/MAHOUT-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914446#action_12914446
 ] 

Sean Owen commented on MAHOUT-402:
----------------------------------

Drew, is this related to what you just fixed for seq2sparse?

> NamedVectors are not readily identifiable in vectordumper output
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-402
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-402
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.4
>            Reporter: Drew Farris
>            Priority: Minor
>         Attachments: MAHOUT-402.patch
>
>
> When dumping a sequence file of Writable,NamedVector using vectordumper in 
> either JSON or standard format, it is not apparent in the output that the 
> vectors are indeed named vectors.
> For example, after applying MAHOUT-401 to produce NamedVectors from 
> seq2sparse, I run:
> {code}
> ./bin/mahout vectordump -j -p -s 
> ~/mahout/reuters-out-seqdir-sparse/tf-vectors/part-00000
> {code}
> And get: 
> {code}
> Input Path: /home/drew/mahout/reuters-out-seqdir-sparse/tf-vectors/part-00000
> /reut2-000.sgm-0.txt    
> {"class":"org.apache.mahout.math.RandomAccessSparseVector","vector" [...]
> {code}
> or when removing the -j argument:
> {code}
> /reut2-000.sgm-0.txt    elts: {1026:3.0, 16150:1.0, 3338:3.0, 16147:1.0, 
> 3339:1.0, 12240:1.0, [...]
> {code}
> The first case, when dumping JSON, is due to the fact that NamedVector simply 
> calls its delegate's asFormatString method. Granted the naive approach of 
> implementing asFormatString in named vector also produces some nasty output:
> {code}
> /reut2-001.sgm-468.txt        
> {"class":"org.apache.mahout.math.NamedVector","vector":"{\"delegate\":{\"class\":\"org.apache.mahout.math.RandomAccessSparseVector\"
>  [...]
> {code}
> So a little more thought needs to be given to that approach.
> For the non-json format, VectorHelper.vectorToString(..) is the culprit. 
> Would it be ok to do an instanceof NamedVector here and emit the name?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to