Reusing jobs

Karl Wettin Thu, 17 Apr 2008 18:00:51 -0700

Is it possible to execute a job more than once?

I use map reduce when adding a new instance to a hierarchial clustertree. It finds the least distant node and inserts the new instance as asibling to that node.

As far as I know it is in very the nature of this algorithm that oneinserts one instance at a time, that this is how the second dimension iscreated that makes it better than a vector cluster. It would be possibleto map all permutations of instances and skip the reduction, but thatwould result in many more calulations than iteratively training the treeas the latter only require one to test against the instances alreadyinserted to the tree.

Iteratively training this tree using Hadoop means executing one job perinstance that measure distance to all instances in a file that I alsoappend the new instance to once inserted in the tree.

All of above is very inefficient, especially with a young tree thatcould be trained in nanoseconds locally. So I do that until it takes 20seconds to insert an instance.

But really, this is all Hadoop framework overhead. I'm not quite sure ofall it does when I execute a job, but it seems like quite a lot. And allI'm doing is executing a couple of identical jobs over and over againusing new data.


It would be very nice if I it just took a few milliseconds to do that.


      karl

Reusing jobs

Reply via email to