I have made a patch for that purpose (
https://issues.apache.org/jira/browse/NUTCH-1317<https://issues.apache.org/jira/browse/NUTCH-1317?focusedCommentId=13749989&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13749989>
).
If you would like , you can set limit by mimetype for nutch-2.1 in
nutch-site.xml as follow:

Default limit property:

<property>
  <name>http.content.limit</name>
  <value>65536</value>
</property>

For example: application/pdf:

<property>
  <name>http.content.limit.application.pdf</name>
  <value>1000</value>
</property>

For example: text/plain:

<property>
  <name>http.content.limit.text.plain</name>
  <value>1000</value>
</property>

...

Reply via email to