Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JamesVictor:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows

The comment on the change is:
removed index-more from example; it threw an exception on my indexing

------------------------------------------------------------------------------
  
  Edit `conf/nutch-site.xml` and change the value of `plugin.includes` to 
include the plugins for the document types that you want Nutch to handle.
  
- For example, to add parsing for PDF, MS Office, and OpenOffice documents, and 
use the `index-more` instead of `index-basic`, you'll have something like:
+ Example: to add parsing for PDF, MS Office, and OpenOffice documents, you'll 
have something like:
  
  {{{
  <property>
    <name>plugin.includes</name>
    
<value>protocol-http|urlfilter-regex|parse-(text|html|js|msexcel|mspowerpoint|msword|oo|pdf|swf|zip)|
- index-more|query-(basic|site|url)|summary-basic|scoring-opic|
+ index-basic|query-(basic|site|url)|summary-basic|scoring-opic|
  urlnormalizer-(pass|regex|basic)</value>
  </property>
  }}}

Reply via email to