[
https://issues.apache.org/jira/browse/MAHOUT-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914446#action_12914446
]
Sean Owen commented on MAHOUT-402:
----------------------------------
Drew, is this related to what you just fixed for seq2sparse?
> NamedVectors are not readily identifiable in vectordumper output
> ----------------------------------------------------------------
>
> Key: MAHOUT-402
> URL: https://issues.apache.org/jira/browse/MAHOUT-402
> Project: Mahout
> Issue Type: Bug
> Components: Utils
> Affects Versions: 0.4
> Reporter: Drew Farris
> Priority: Minor
> Attachments: MAHOUT-402.patch
>
>
> When dumping a sequence file of Writable,NamedVector using vectordumper in
> either JSON or standard format, it is not apparent in the output that the
> vectors are indeed named vectors.
> For example, after applying MAHOUT-401 to produce NamedVectors from
> seq2sparse, I run:
> {code}
> ./bin/mahout vectordump -j -p -s
> ~/mahout/reuters-out-seqdir-sparse/tf-vectors/part-00000
> {code}
> And get:
> {code}
> Input Path: /home/drew/mahout/reuters-out-seqdir-sparse/tf-vectors/part-00000
> /reut2-000.sgm-0.txt
> {"class":"org.apache.mahout.math.RandomAccessSparseVector","vector" [...]
> {code}
> or when removing the -j argument:
> {code}
> /reut2-000.sgm-0.txt elts: {1026:3.0, 16150:1.0, 3338:3.0, 16147:1.0,
> 3339:1.0, 12240:1.0, [...]
> {code}
> The first case, when dumping JSON, is due to the fact that NamedVector simply
> calls its delegate's asFormatString method. Granted the naive approach of
> implementing asFormatString in named vector also produces some nasty output:
> {code}
> /reut2-001.sgm-468.txt
> {"class":"org.apache.mahout.math.NamedVector","vector":"{\"delegate\":{\"class\":\"org.apache.mahout.math.RandomAccessSparseVector\"
> [...]
> {code}
> So a little more thought needs to be given to that approach.
> For the non-json format, VectorHelper.vectorToString(..) is the culprit.
> Would it be ok to do an instanceof NamedVector here and emit the name?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.