Hi, You can go through the code of this project ( https://github.com/zinnia-phatak-dev/Nectar) to understand how the complex algorithms are implemented using M/R.
On Fri, May 18, 2012 at 12:16 PM, Ravi Joshi <ravi.josh...@yahoo.com> wrote: > I am writing my own map and reduce method for implementing K Means > algorithm in Hadoop-1.0.1 in java language. Although i got some example > link of K Means algorithm in Hadoop over blogs but i don't want to copy > their code, as a lerner i want to implement it my self. So i just need some > ideas/clues for the same. Below is the work which i already done. > > I have Point and Cluster classes which are Writable, Point class have > point x, point y and Cluster by whom this Point belongs. On the other hand > my Cluster class has an ArrayList which stores all the Point objects which > belongs to that Cluster. Cluseter class has an centroid variable also. Hope > i am going correct (if not correct me please.) > > Now first of all my input (which is a file, containing some points > coordinates) must be provided to Point Objects. I mean this input file must > be mapped to all the Point. This should be done ONCE in map class (but > how?). After assigning some value to each Point, some random Cluster must > be chosen at the initial phase (This must be done only ONCE, but how). Now > every Point must be mapped to all the cluster with the distance between > that point and centroid. In the reduce method, every Point will be checked > and assigned to that Cluster which is nearest to that Point (by comparing > the distance). Now new centroid is calculated in each Cluster (Should map > and reduce be called recursively? if yes then where all the initialization > part would go. Here by saying initialization i mean providing input to > Point objects (which must be done ONCE initially) and choosing some random > centroid (Initially we have to choose random centroid ONCE) ). > One more question, The value of parameter K(which will decide the total > number of clusters should be assigned by user or hadoop will itself decide > it?) > > Somebody please explain me, i don't need the code, i want to write it > myself. I need a way. Thank you. > > -Ravi > -- https://github.com/zinnia-phatak-dev/Nectar