Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IndexStructure" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=17&rev2=18

  
  The index structure formed after indexing is shown below : 
  
- ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin''' ||'''Comment'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' 
||'''Comment'''||
- ||    boost    ||     YES     ||      Not Indexed     || scoring-opic/link || 
Adds a '''score''' value field to a particular document. This is allocated 
based upon its importance within the webgraph. ||
+ ||    boost    ||     YES     ||      Not Indexed     || various scoring 
plugins || Adds a '''score''' value field to a particular document. This is 
allocated based upon its importance within the webgraph. ||
- ||    digest  ||      YES     ||      Not Indexed     ||  /!\ NEEDS COMMENT 
/!\|| Adds a '''message digest''' field to a document. Can be MD5 over content 
and headers or more sophisticated text profile of the content. ||
+ ||    digest  ||      YES     ||      Not Indexed     || 
org.apache.nutch.indexer.IndexerMapReduce.java || Adds a '''message digest''' 
field to a document. Can be MD5 over content and headers or more sophisticated 
text profile of the content. ||
  ||    lang    ||      YES     ||      Un-Tokenized    ||      
language-identifier || Add a '''lang''', language field to a document.||
- ||    segment ||              YES     ||      Not Indexed     || /!\ NEEDS 
COMMENT /!\ || Adds the originating '''segment''' field to the document, used 
to identify the most recent segment in which this document was fetched. ||
+ ||    segment ||              YES     ||      Not Indexed     || 
org.apache.nutch.indexer.IndexerMapReduce.java || Adds the originating 
'''segment''' field to the document, used to identify the most recent segment 
in which this document was fetched. ||
  ||    tstamp  ||      YES     ||      Tokenized       || /!\ NEEDS COMMENT 
/!\ || Adds a '''timestamp''' field of the most recent time this document was 
fetched ||
  ||    cc:license      ||      YES     ||      Indexed, Tokenized      || 
creativecommons || Adds the entire license as '''cc:license=xxx''' and 
'''attributes''' extracted of the license url||
  ||    cc:meta ||      YES     ||      Indexed, Tokenized      ||      
creativecommons || Adds the license location as '''cc:meta=xxx''' ||

Reply via email to