Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IndexStructure" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=11&rev2=12

  
  The index structure formed after indexing is shown below : 
  
- ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Indexing Filter/Plugin''' 
||'''Comment'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin''' ||'''Comment'''||
- ||    boost    ||     YES     ||      Not Indexed     ||      Indexer || ||
- ||    digest  ||      YES     ||      Not Indexed     ||      Indexer || ||
+ ||    boost    ||     YES     ||      Not Indexed     || /!\ NEEDS COMMENT 
/!\ || /!\ NEEDS COMMENT /!\ ||
+ ||    digest  ||      YES     ||      Not Indexed     || /!\ NEEDS COMMENT 
/!\ || /!\ NEEDS COMMENT /!\ ||
  ||    lang    ||      YES     ||      Un-Tokenized    ||      
language-identifier || Add a '''lang''', language field to a document.||
- ||    segment ||              YES     ||      Not Indexed     ||      Indexer 
|| ||
- ||    tstamp  ||      YES     ||      Tokenized       ||      Indexer || ||
+ ||    segment ||              YES     ||      Not Indexed     || /!\ NEEDS 
COMMENT /!\ || /!\ NEEDS COMMENT /!\ ||
+ ||    tstamp  ||      YES     ||      Tokenized       || /!\ NEEDS COMMENT 
/!\ || /!\ NEEDS COMMENT /!\ ||
  ||    cc:license      ||      YES     ||      Indexed, Tokenized      || 
creativecommons || Adds the entire license as '''cc:license=xxx''' and 
'''attributes''' extracted of the license url||
  ||    cc:meta ||      YES     ||      Indexed, Tokenized      ||      
creativecommons || Adds the license location as '''cc:meta=xxx''' ||
  ||    cc:type ||      YES     ||      Indexed,Tokenized       ||      
creativecommons || Adds the work type as '''cc:type=xxx'''||
@@ -21, +21 @@

  ||    content         ||      NO      ||      Tokenized       ||      
index-basic     || Adds basic searchable '''content field''' to a document. ||
  ||    lastModified    ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more || Adds some time related meta info in the form of 
'''last-modified''' if present. ||
  ||    date    ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more || Index date as last-modified, or, if that's not present, uses 
fetch time. ||
- ||    contentLength   ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more || /!\ NEEDS COMMENT/!\ ||
+ ||    contentLength   ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more || /!\ NEEDS COMMENT /!\ ||
  ||    type    ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more      || Adds contentType, primaryType, subType (all mime-types) ||
  ||    primaryType     ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more      ||      primaryType (mime-type) ||
  ||    subType         ||      NO      ||      Indexed, Un-Tokenized   ||      
index-more      ||      subType (mime-type) ||

Reply via email to