So it sounds like there are a few things going on:

(1) The quick fix would be to revert to the WeightedPropertyVectorWritable
so we could hold on to the key or distance to centroid, e.g., for each
vector
(2) But WeightedPropertyVectorWritable is not sufficient or general enough
for how people want to notate vectors
(3) NamedVector should be factored out

Is this accurate?  Do people already have a concept of what it would look
like to handle vector properties more intelligently?



On Wed, Oct 30, 2013 at 5:53 PM, Grant Ingersoll (JIRA) <[email protected]>wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809813#comment-13809813]
>
> Grant Ingersoll commented on MAHOUT-1030:
> -----------------------------------------
>
> Andrew, I suppose it depends on what part of it you want to address.  If
> it is the literal part of this bug, Pat has been pretty responsive.  If it
> is the reworking of the properties of vectors, that is probably best
> handled on the mailing list.  The basic gist being we want to more
> intelligently handle vector properties and get rid of NamedVector.
>  [~tdunning], [~robinanil] and others may have some thoughts here as well.
>
> (FWIW, I'd prefer the latter to be tackled.)
>
> > Regression: Clustered Points Should be WeightedPropertyVectorWritable
> not WeightedVectorWritable
> >
> ------------------------------------------------------------------------------------------------
> >
> >                 Key: MAHOUT-1030
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Clustering, Integration
> >    Affects Versions: 0.7
> >            Reporter: Jeff Eastman
> >            Assignee: Andrew Musselman
> >             Fix For: 1.0, 0.9
> >
> >         Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch,
> MAHOUT-1030.patch
> >
> >
> > Looks like this won't make it into this build. Pretty widespread impact
> on code and tests and I don't know which properties were implemented in the
> old version. I will create a JIRA and post my interim results.
> > On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > > That's a reversion that evidently got in when the new
> ClusterClassificationDriver was introduced. It should be a pretty easy fix
> and I will see if I can make the change before Paritosh cuts the release
> bits tonight.
> > >
> > > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> > >> It appears that in kmeans the clusteredPoints are now written as
> WeightedVectorWritable where in mahout 0.6 they were
> WeightedPropertyVectorWritable? This means that the distance from the
> centroid is no longer stored here? Why? I hope I'm wrong because that is
> not a welcome change. How is one to order clustered docs by distance from
> cluster centroid?
> > >>
> > >> I'm sure I could calculate the distance but that would mean looking
> up the centroid for the cluster id given in the above
> WeightedVectorWritable, which means iterating through all the clusters for
> each clustered doc. In my case the number of clusters could be fairly large.
> > >>
> > >> Am I missing something?
> > >>
> > >>
> > >
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1#6144)
>

Reply via email to