Yossi Tamari created NUTCH-2511:
-----------------------------------
Summary: SitemapProcessor limited by http.content.limit
Key: NUTCH-2511
URL: https://issues.apache.org/jira/browse/NUTCH-2511
Project: Nutch
Issue Type: Bug
Components: sitemap
Affects Versions: 1.14
Reporter: Yossi Tamari
Because SitemapProcessor uses the HTTP protocol plugin, which limits the size
of a response to http.content.limit (64KB by default), it can only handle
sitemaps smaller than that size.
I don't believe that is the intent of the users by setting http.content.limit -
they want to limit document size, not sitemap size. The spec specifically says
that sitemaps can be up to 50MB.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)