Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IndexStructure" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=7&rev2=8

  
  The index structure formed after indexing is shown below : 
  
- ||'''FieldName'''||'''Stored'''||'''Index'''|| '''IndexingFilter''' 
||'''Comment'''||
+ ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Indexing Filter/Plugin''' 
||'''Comment'''||
- ||    boost    ||     YES     ||      NotIndexed      ||      Indexer || ||
+ ||    boost    ||     YES     ||      Not Indexed     ||      Indexer || ||
- ||    digest  ||      YES     ||      NotIndexed      ||      Indexer || ||
+ ||    digest  ||      YES     ||      Not Indexed     ||      Indexer || ||
- ||    lang    ||      YES     ||      UnTokenized     ||      
language-identifier || ||
+ ||    lang    ||      YES     ||      Un-Tokenized    ||      
language-identifier || ||
- ||    segment ||              YES     ||      NotIndexed      ||      Indexer 
|| ||
+ ||    segment ||              YES     ||      Not Indexed     ||      Indexer 
|| ||
  ||    tstamp  ||      YES     ||      Tokenized       ||      Indexer || ||
  ||    anchor  ||      NO      ||      Tokenized       ||      index-anchor || 
Indexing filter that indexes all inbound '''anchor text''' for a document.||
  ||    title   ||      YES     ||      Tokenized       ||      index-basic     
|| Adds basic searchable '''title field''' to a document. Also indexed by 
index-more ||
- ||    site    ||      NO      ||      UnTokenized     ||      index-basic || 
Adds basic searchable '''site field''' to a document. ||
+ ||    site    ||      NO      ||      Un-Tokenized    ||      index-basic || 
Adds basic searchable '''site field''' to a document. ||
  ||    host    ||      NO      ||      Tokenized       ||      index-basic     
|| Adds basic searchable '''hostname field''' to a document. ||
  ||    url     ||      YES     ||      Tokenized       ||      index-basic || 
Adds basic searchable '''URL field''' to a document. ||
  ||    content         ||      NO      ||      Tokenized       ||      
index-basic     || Adds basic searchable '''content field''' to a document. ||
  ||    lastModified    ||      YES     ||      NotIndexed      ||      
index-more || ||
- ||    date    ||      NO      ||      UnTokenized     ||      index-more || ||
+ ||    date    ||      NO      ||      Un-Tokenized    ||      index-more || ||
- ||    contentLength   ||      YES     ||      NotIndexed      ||      
index-more || ||
+ ||    contentLength   ||      YES     ||      Not Indexed     ||      
index-more || ||
- ||    type    ||      NO      ||      UnTokenized     ||      index-more      
||      contentType,primaryType,subType (all mime-types) ||
+ ||    type    ||      NO      ||      Un-Tokenized    ||      index-more      
||      contentType,primaryType,subType (all mime-types) ||
- ||    primaryType     ||      YES     ||      UnTokenized     ||      
index-more      ||      primaryType (mime-type) ||
+ ||    primaryType     ||      YES     ||      Un-Tokenized    ||      
index-more      ||      primaryType (mime-type) ||
- ||    subType         ||      YES     ||      UnTokenized     ||      
index-more      ||      subType (mime-type) ||
+ ||    subType         ||      YES     ||      Un-Tokenized    ||      
index-more      ||      subType (mime-type) ||
- ||      tld             ||     YES      || UnTokenized / NotStored(based on 
conf) || tld || see http://issues.apache.org/jira/browse/NUTCH-439 ||
+ ||      tld             ||     YES      || Un-Tokenized / NotStored(based on 
conf) || tld || see http://issues.apache.org/jira/browse/NUTCH-439 ||
- ||      category        ||    NO        || UnTokenized || index-url-category 
|| see http://issues.apache.org/jira/browse/NUTCH-386 ||
+ ||      category        ||    NO        || Un-Tokenized || index-url-category 
|| see http://issues.apache.org/jira/browse/NUTCH-386 ||
  ||      subcollection   ||    YES || Tokenized || subcollection || see 
subcollection plugin ||
  
  ----

Reply via email to