Re: Serializing code to nodes: no can do?

2007-04-24 Thread Doug Cutting
Pedro Guedes wrote: For this I need to be able to register new steps in my chain and pass them to hadoop to execute as a mapreduce job. I see two choices here: 1 - build a .job archive (main-class: mycrawler, submits jobs thru JobClient) with my new steps and dependencies in the 'lib/'

Re: Serializing code to nodes: no can do?

2007-04-23 Thread Pedro Guedes
I'm trying to pass a crawling chain (which is the steps to execute while crawling a resource) as configuration. And then execute the chain to each resource I find in my crawl database (like nutch's crawldb). For this I need to be able to register new steps in my chain and pass them to hadoop to

Serializing code to nodes: no can do?

2007-04-18 Thread Pedro Guedes
Hi hadoopers, I'm working on an enterprise search engine that works on an hadoop cluster but is controlled form the outside. I managed to implement a simple crawler much like Nutch's... Now i have a new system's requirement: the crawl process must be configurable outside hadoop. This means that I

Re: Serializing code to nodes: no can do?

2007-04-18 Thread Pedro Guedes
I keep talking to myself... hope it doesn't annoy u too much! We thought of a solution to our problem in wich we build a new .job file, in accordance with our crawl configuration, and then pass it to hadoop for execution... Is there somewhere i can look for the specification of the .job format?

Re: Serializing code to nodes: no can do?

2007-04-18 Thread Michael Bieniosek
I'm not sure exactly what you're trying to do, but you can specify command line parameters to hadoop -jar which you can interpret in your code. Your code can then write arbitrary config parameters before starting the mapreduce. Based on these configs, you can load specific jars in your mapreduce