Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Nutch2Crawling" page has been changed by FerdyGalema: http://wiki.apache.org/nutch/Nutch2Crawling?action=diff&rev1=2&rev2=3 Comment: fix typo * GeneratorJob * FetcherJob * ParserJob (optionally done during fetch using 'fetcher.parse') - * DbUpdateJob + * DbUpdaterJob To populate initial rows for the webtable you can use the InjectorJob. There is a single table '''webpage''' that is the input and output for these jobs. Every row in this table is an url (WebPage). To group urls from the same TLD and domain closely together, the row key is stored as url with '''reversed host components'''. This takes advantage of the fact that row keys are sorted (in most NoSQL stores). Scanning over a subset is generally a lot faster than scanning over the entire table with specific rowkey filtering. See the following example rowkey listing:

