Hi, Nutch 2.x is different because it makes use of external data stores to hold all crawled data.
However, the steps to run a crawl are the same: inject loop: generate, fetch, parse, update invert links, index Only the arguments passed to run each step may be different. Have a look at: https://wiki.apache.org/nutch/Nutch2Tutorial and the bin/crawl script which is provided for both 1.x and 2.x The differences in should be obvious. But may I ask, why you do not keep going to use Nutch 1.x which is still maintained, in some respects even better than 2.x? Cheers, Sebastian On 03/23/2016 06:06 PM, Sabah Sajjad Khan wrote: > Hello, > > > We worked with nutch1.x for a project and were able to successfully crawl the > way we want. Our > project now requires us to use nutch2.x and we seem to see a lack of > documentation to help. We are > able to inject but have no idea what to do next. Is it the same as nutch1.x? > Any help would be > appreciated as we are students and have been struggling for a good 2 months > now! > > > Thank you >

