[ https://issues.apache.org/jira/browse/NUTCH-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1458: ---------------------------------------- Fix Version/s: 1.7 > Support for raw HTML field added to Solr > ---------------------------------------- > > Key: NUTCH-1458 > URL: https://issues.apache.org/jira/browse/NUTCH-1458 > Project: Nutch > Issue Type: New Feature > Components: indexer, parser > Affects Versions: 1.5.1 > Reporter: Max Dzyuba > Labels: html, nutch, raw, solr > Fix For: 1.7 > > > At the moment, the “content” field holds only the parsed text from the page. > It would be nice to have a separate field in Solr document that would hold > raw HTML from the crawled page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira