Luis Lopez created NUTCH-2032:
---------------------------------

             Summary: Plugin to index the raw content of a readable document. 
                 Key: NUTCH-2032
                 URL: https://issues.apache.org/jira/browse/NUTCH-2032
             Project: Nutch
          Issue Type: New Feature
          Components: indexer, parser
    Affects Versions: 1.10
            Reporter: Luis Lopez
             Fix For: 1.11


This is related to https://issues.apache.org/jira/browse/NUTCH-1785 and 
https://issues.apache.org/jira/browse/NUTCH-1458

We created a couple plugins to index the raw content of readable documents. If 
we include these plugins in the plugin chain we'll index the raw content of a 
readable document, i.e. XML, HTML, CSV, TXT etc. The index-rawcontent plugin is 
not designed to index binary files, however having the full content of an 
HTML/XML or a CSV document is really critical for some of us.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to