Hi hadoopers, I'm working on an enterprise search engine that works on an hadoop cluster but is controlled form the outside. I managed to implement a simple crawler much like Nutch's... Now i have a new system's requirement: the crawl process must be configurable outside hadoop. This means that I should be able to add steps to the crawling process that the cluster would execute without knowing before hand what they are... since serialization if not possible, is there another way to achieve the same effect?
Using Writable means I need implementations to be on each node so they can read the object data from HDFS... but then i just get the same object and not a new implementation, right? Any thoughts will be appreciated, Pedro
