Gabriele, I think it is a good idea to have a script like this however your proposal could be improved. It currently works only on a single machine and uses commands such as mv, ls etc... which won't work on a pseudo or fully distributed cluster. You should use the 'hadoop fs' commands instead. If I understand the script correctly, you then merge different crawldbs. Why do you do that? There should be one crawldb per crawl so I don't think this is at all necessary.
Having a script would definitely be a plus for beginners and would give more flexibility than the crawl command. Thanks Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

