Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

KHATWANI PARTH BHARAT Wed, 26 Apr 2017 02:04:42 -0700

@Trevor and @Dmitriy

Tough Bug in Aggregating Transpose is fixed. One issue is still left which
is causing hindrance in completing the KMeans Code
That issue is of Assigning the the Row Keys of The DRM with the "Closest
Cluster Index" found
Consider the Matrix of Data points given as follows


{
   0 => {0:1.0,    1: 1.0,    2: 1.0,   3: 3.0}
   1 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
   2 => {0:1.0,    1: 3.0,    2: 4.0,   3: 5.0}
   3 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
  }
Now these are
0 =>
1 =>
2 =>
3 =>
the Row keys. Here Zeroth column(0) contains the values which will be used
the store the count of Points assigned to each cluster and Column 1 to 3
contains co-ordinates of the data points.

So now after cluster assignment step of Kmeans algorithm which @Dmitriy has
Outlined in the beginning of this mail chain,

the above Matrix should look like this(Assuming that the 0th and 1st data
points are assigned to the cluster with index 0 and 2nd and 3rd data points
are assigned to cluster with index 1)

 {
   0 => {0:1.0,    1: 1.0,    2: 1.0,   3: 3.0}
   0 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
   1 => {0:1.0,    1: 3.0,    2: 4.0,   3: 5.0}
   1 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
 }

to achieve above mentioned result i using following code lines of code

//11. Iterating over the Data Matrix(in DrmLike[Int] format)
dataDrmX.mapBlock() {
  case (keys, block) =>
    for (row <- 0 until block.nrow) {
         var dataPoint = block(row, ::)

         //12. findTheClosestCentriod find the closest centriod to the Data
point specified by "dataPoint"
         val closesetIndex = findTheClosestCentriod(dataPoint, centriods)

         //13. assigning closest index to key
         keys(row) = closesetIndex
     }
     keys -> block
}

But it turns out to be

 {
   0 => {0:1.0,    1: 2.0,    2: 3.0,   3: 4.0}
   1 => {0:1.0,    1: 4.0,    2: 5.0,   3: 6.0}
 }


So is there any thing wrong with the syntax of the above code.I am unable
to find any reference to the way in which i should assign a value to the
row keys.

@Trevor as per what you have mentioned in the above mail chain
"Got it- in short no.

Think of the keys like a dictionary or HashMap.

That's why everything is ending up on row 1."

But according to Algorithm outlined by@Dmitriy at start of the mail chain
we assign same key To Multiple Rows is possible.
Same is also mentioned in the Book Written by Dmitriy and Andrew.
It is mentioned that the rows having the same row keys summed up when we
take aggregating transpose.

I now confused that weather it possible to achieve what i have mentioned
above or it is not possible to achieve or it is the Bug in the API.



Thanks & Regards
Parth

On Tue, Apr 25, 2017 at 9:07 PM, Khurrum Nasim <khurrum.na...@useitc.com>
wrote:

> Can mahout be used for self driving tech ?
>
> Thanks,
>
> Khurrum.
>
> On Apr 24, 2017, 10:34 PM -0400, KHATWANI PARTH BHARAT <
> h2016...@pilani.bits-pilani.ac.in>, wrote:
> > @Trevor and @Dmitriy
> >
> > Tough Bug in Aggregating Transpose is fixed. One issue is still left
> which
> > is causing hindrance in completing the KMeans Code
> > That issue is of Assigning the the Row Keys of The DRM with the "Closest
> > Cluster Index" found
> > Consider the Matrix of Data points given as follows
> >
> > {
> > 0 => {0:1.0, 1: 1.0, 2: 1.0, 3: 3.0}
> > 1 => {0:1.0, 1: 2.0, 2: 3.0, 3: 4.0}
> > 2 => {0:1.0, 1: 3.0, 2: 4.0, 3: 5.0}
> > 3 => {0:1.0, 1: 4.0, 2: 5.0, 3: 6.0}
> > }
> > Now these are
> > 0 =
> > 1 =
> > 2 =
> > 3 =
> > the Row keys. Here Zeroth column(0) contains the values which will be
> used
> > the store the count of Points assigned to each cluster and Column 1 to 3
> > contains co-ordinates of the data points.
> >
> > So now after cluster assignment step of Kmeans algorithm which @Dmitriy
> has
> > Outlined in the beginning of this mail chain,
> >
> > the above Matrix should look like this(Assuming that the 0th and 1st data
> > points are assigned to the cluster with index 0 and 2nd and 3rd data
> points
> > are assigned to cluster with index 1)
> >
> > {
> > 0 => {0:1.0, 1: 1.0, 2: 1.0, 3: 3.0}
> > 0 => {0:1.0, 1: 2.0, 2: 3.0, 3: 4.0}
> > 1 => {0:1.0, 1: 3.0, 2: 4.0, 3: 5.0}
> > 1 => {0:1.0, 1: 4.0, 2: 5.0, 3: 6.0}
> > }
> >
> > to achieve above mentioned result i using following code lines of code
> >
> > //11. Iterating over the Data Matrix(in DrmLike[Int] format)
> > dataDrmX.mapBlock() {
> > case (keys, block) =
> > for (row <- 0 until block.nrow) {
> > var dataPoint = block(row, ::)
> >
> > //12. findTheClosestCentriod find the closest centriod to the Data
> > point specified by "dataPoint"
> > val closesetIndex = findTheClosestCentriod(dataPoint, centriods)
> >
> > //13. assigning closest index to key
> > keys(row) = closesetIndex
> > }
> > keys -> block
> > }
> >
> > But it turns out to be
> >
> > {
> > 0 => {0:1.0, 1: 2.0, 2: 3.0, 3: 4.0}
> > 1 => {0:1.0, 1: 4.0, 2: 5.0, 3: 6.0}
> > }
> >
> >
> > So is there any thing wrong with the syntax of the above code.I am unable
> > to find any reference to the way in which i should assign a value to the
> > row keys.
> >
> > @Trevor as per what you have mentioned in the above mail chain
> > "Got it- in short no.
> >
> > Think of the keys like a dictionary or HashMap.
> >
> > That's why everything is ending up on row 1."
> >
> > But according to Algorithm outlined by@Dmitriy at start of the mail
> chain
> > we assign same key To Multiple Rows is possible.
> > Same is also mentioned in the Book Written by Dmitriy and Andrew.
> > It is mentioned that the rows having the same row keys summed up when we
> > take aggregating transpose.
> >
> > I now confused that weather it possible to achieve what i have mentioned
> > above or it is not possible to achieve or it is the Bug in the API.
> >
> >
> >
> > Thanks & Regards
> > Parth
> > <#m_33347126371020841_m_5688102708516554904_
>

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

Reply via email to