Hello, Sorry for cross posting. We have a compute cluster running Torque resource manager and Maui scheduler. Compute cluster is almost full but at times (early mornings, late night, holidays), resources are available in pockets( 2-10 nodes for 2-5 hrs). Our idea is setup nutch(hadoop) in a way to utilize these pockets i.e. an automated system wherein a long crawling job is broken down in to smaller map/red jobs . The system would be constantly monitoring the availability of resources and would request, execute and finalize these smaller tasks using resource manager interface. We had a look at HOD but to the extent of my knowledge about it, it does not serve the purpose.
In a way it is too much to ask and may be a complete solution is not available but any pointers/links are more than welcomed. We are also looking at JobStream.py available at http://wiki.apache.org/nutch/Automating_Fetches_with_Python Thanks -- Rishi Pathak National PARAM Supercomputing Facility C-DAC