[ 
https://issues.apache.org/jira/browse/NUTCH-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1018.
-----------------------------------------

    Resolution: Won't Fix

Looks like a plugin is the solution here. Closing as won't fix. 
                
> Solr Document Size Limit
> ------------------------
>
>                 Key: NUTCH-1018
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1018
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Mark Achee
>            Priority: Minor
>              Labels: solr
>
> There should be an option, perhaps named solr.content.limit, that defines the 
> max size of documents added to Solr.  I've had issues with large documents in 
> Solr, so I set the file.content.limit to 2MB.  However, this causes many 
> files to not be parsed (mostly PDFs) because of only retrieving parts of the 
> document.  With this new option, I could still correctly parse them, but only 
> index the first 2MB (or however large it is set) in Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to