Problems managing outlinks with large url length ------------------------------------------------
Key: NUTCH-802 URL: https://issues.apache.org/jira/browse/NUTCH-802 Project: Nutch Issue Type: Bug Components: parser Reporter: Pablo Aragón Nutch can get idle during the collection of outlinks if the URL address of the outlink is too large. The maximum sizes of an URL for the main web servers are: * Apache: 4,000 bytes * Microsoft Internet Information Server (IIS): 16, 384 bytes * Perl HTTP::Daemon: 8.000 bytes URL adress sizes bigger than 4000 bytes are problematic, so the limit should be set in the nutch-default.xml configuration file. I attached a patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.