Yossi Tamari created NUTCH-2511:

             Summary: SitemapProcessor limited by http.content.limit
                 Key: NUTCH-2511
                 URL: https://issues.apache.org/jira/browse/NUTCH-2511
             Project: Nutch
          Issue Type: Bug
          Components: sitemap
    Affects Versions: 1.14
            Reporter: Yossi Tamari

Because SitemapProcessor uses the HTTP protocol plugin, which limits the size 
of a response to http.content.limit (64KB by default), it can only handle 
sitemaps smaller than that size. 

I don't believe that is the intent of the users by setting http.content.limit - 
they want to limit document size, not sitemap size. The spec specifically says 
that sitemaps can be up to 50MB.

This message was sent by Atlassian JIRA

Reply via email to