Hi, I'm having trouble running the KMeansDriver and I suspect that the problem is related to Adil's message. I'm in an environment which recently switched to Hadoop 0.2. I am no longer able to use hod as a scheduler. Furthermore, I'm forced to specify the queue (which unfortunately is not named default). This is normally done using -Dmapred.job.queue.name. Is there any way that I will be able to use Mahout, specifically the clustering code? When I run the KmeansDriver code with the -D option, it gives the following error message:
09/10/28 01:09:21 ERROR kmeans.KMeansDriver: Exception org.apache.commons.cli2.OptionException: Unexpected -D while processing Options On 9/14/09 3:19 PM, "Adil Aijaz" <[email protected]> wrote: Hi folks, I just recently merged my vendor branch of Mahout with Mahout trunk and found that Mahout now supports Hadoop 0.20. Now, with Hadoop 0.20, we now have the ability to use capacity scheduler instead of hod. There are two ways to pass on the capacity scheduler queue name to a Mahout driver class like KMeansDriver: 1. Have KMeansDriver extend 'Configured' and implement 'Tool' interface to allow command line specification of the scheduler queue name as in -Dmapred.job.queue.name=myqueuename 2. Add jobConfi.set() while setting up the drivers. Personally, I prefer the first solution. Are there any plans on updating the various driver classes to support such capacity scheduler queues? Either way, I can help out in the process. Adil
