Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IndexStructure" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diff&rev1=17&rev2=18 The index structure formed after indexing is shown below : - ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin''' ||'''Comment'''|| + ||'''Field Name'''||'''Stored'''||'''Index'''|| '''Plugin/Class''' ||'''Comment'''|| - || boost || YES || Not Indexed || scoring-opic/link || Adds a '''score''' value field to a particular document. This is allocated based upon its importance within the webgraph. || + || boost || YES || Not Indexed || various scoring plugins || Adds a '''score''' value field to a particular document. This is allocated based upon its importance within the webgraph. || - || digest || YES || Not Indexed || /!\ NEEDS COMMENT /!\|| Adds a '''message digest''' field to a document. Can be MD5 over content and headers or more sophisticated text profile of the content. || + || digest || YES || Not Indexed || org.apache.nutch.indexer.IndexerMapReduce.java || Adds a '''message digest''' field to a document. Can be MD5 over content and headers or more sophisticated text profile of the content. || || lang || YES || Un-Tokenized || language-identifier || Add a '''lang''', language field to a document.|| - || segment || YES || Not Indexed || /!\ NEEDS COMMENT /!\ || Adds the originating '''segment''' field to the document, used to identify the most recent segment in which this document was fetched. || + || segment || YES || Not Indexed || org.apache.nutch.indexer.IndexerMapReduce.java || Adds the originating '''segment''' field to the document, used to identify the most recent segment in which this document was fetched. || || tstamp || YES || Tokenized || /!\ NEEDS COMMENT /!\ || Adds a '''timestamp''' field of the most recent time this document was fetched || || cc:license || YES || Indexed, Tokenized || creativecommons || Adds the entire license as '''cc:license=xxx''' and '''attributes''' extracted of the license url|| || cc:meta || YES || Indexed, Tokenized || creativecommons || Adds the license location as '''cc:meta=xxx''' ||

