Yossi Tamari created NUTCH-2511:
-----------------------------------

             Summary: SitemapProcessor limited by http.content.limit
                 Key: NUTCH-2511
                 URL: https://issues.apache.org/jira/browse/NUTCH-2511
             Project: Nutch
          Issue Type: Bug
          Components: sitemap
    Affects Versions: 1.14
            Reporter: Yossi Tamari


Because SitemapProcessor uses the HTTP protocol plugin, which limits the size 
of a response to http.content.limit (64KB by default), it can only handle 
sitemaps smaller than that size. 

I don't believe that is the intent of the users by setting http.content.limit - 
they want to limit document size, not sitemap size. The spec specifically says 
that sitemaps can be up to 50MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to