Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "bin/nutch readlinkdb" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch%20readlinkdb

Comment:
Update to reflect Nutch 1.3 API

New page:
Readlinkdb is an alias for org.apache.nutch.crawl.LinkDbReader

This reader class enables us to to obtain various information from within a 
linkdb. The two types of information we can retirieve is
'''i.''' A dump of the whole linkdb which is then written to a text file for 
easy viewing.
'''ii.''' Specific information relating to a specific URL. 
/!\ :TODO: More could be added to the above e.g what is the nature and 
structure of the information we retieve from a dump of the linkdb and a 
specific URL. /!\ 
Usage: 
{{{
bin/nutch Usage: LinkDbReader <linkdb> (-dump <out_dir> | -url <url>)
}}}

'''<linkdb>''': This is the linkdb diretory we wish to read and obtain 
information from.


'''-dump <out_dir>''': This parameter dumps the whole linkdb to a text file in 
any <out_dir> we wish to specify.


'''-url <url>''': The -url arguement provides us with information about a 
specific <url>. This is written to System.out.



CommandLineOptions

Reply via email to