[
https://issues.apache.org/jira/browse/MAHOUT-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014451#comment-14014451
]
ASF GitHub Bot commented on MAHOUT-1505:
----------------------------------------
Github user dlyubimov commented on a diff in the pull request:
https://github.com/apache/mahout/pull/5#discussion_r13257657
--- Diff: CHANGELOG ---
@@ -2,6 +2,8 @@ Mahout Change Log
Release 1.0 - unreleased
+ MAHOUT-1505: structure of clusterdump's JSON output (akm)
+
--- End diff --
I think it'd be better to add this when (after) you will be doing a squash
pull. Otherwise you'd be merging this file with other changes. This file is
guaranteed to change by other commits every time. Although most likely this
conflict will be handled automatically by git.
> structure of clusterdump's JSON output
> --------------------------------------
>
> Key: MAHOUT-1505
> URL: https://issues.apache.org/jira/browse/MAHOUT-1505
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.9
> Reporter: Terry Blankers
> Assignee: Andrew Musselman
> Labels: json
> Fix For: 1.0
>
> Attachments: MAHOUT-1505.patch
>
>
> Hi all, I'm working on some automated analysis of the clusterdump output
> using '-of = JSON'. While digging into the structure of the representation of
> the data I've noticed something that seems a little odd to me.
> In order to access the data for a particular cluster, the 'cluster', 'n', 'c'
> & 'r' values are all in one continuous string. For example:
> {noformat}
> {"cluster":"VL-10515{n=5924 c=[action:0.023, adherence:0.223,
> administration:0.011 r=[action:0.446, adherence:1.501,
> administration:0.306]}"}
> {noformat}
> This is also the case for the "point":
> {noformat}
> {"point":"013FFD34580BA31AECE5D75DE65478B3D691D138 = [body:6.904,
> harm:10.101]","vector_name":"013FFD34580BA31AECE5D75DE65478B3D691D138","weight":"1.0"}
> {noformat}
> This leads me to believe that the only way I can get to the individual data
> in these items is by string parsing. For JSON deserialization I would have
> expected to see something along the lines of:
> {noformat}
> {
> "cluster":"VL-10515",
> "n":5924,
> "c":
> [
> {"action":0.023},
> {"adherence":0.223},
> {"administration":0.011}
> ],
> "r":
> [
> {"action":0.446},
> {"adherence":1.501},
> {"administration":0.306}
> ]
> }
> {noformat}
> and:
> {noformat}
> {
> "point": {
> "body": 6.904,
> "harm": 10.101
> },
> "vector_name": "013FFD34580BA31AECE5D75DE65478B3D691D138",
> "weight": 1.0
> }
> {noformat}
> Andrew Musselman replied:
> {quote}
> Looks like a bug to me as well; I would have expected something similar to
> what you were expecting except maybe something like this which puts the "c"
> and "r" values in objects rather than arrays of single-element objects:
> {noformat}
> {
> "cluster":"VL-10515",
> "n":5924,
> "c":
> {
> "action":0.023,
> "adherence":0.223,
> "administration":0.011
> },
> "r":
> {
> "action":0.446,
> "adherence":1.501,
> "administration":0.306
> }
> }
> {noformat}
> {quote}
--
This message was sent by Atlassian JIRA
(v6.2#6252)