[ 
https://issues.apache.org/jira/browse/NUTCH-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028943#comment-18028943
 ] 

ASF GitHub Bot commented on NUTCH-3099:
---------------------------------------

lewismc opened a new pull request, #865:
URL: https://github.com/apache/nutch/pull/865

   This PR updates the patch at 
https://issues.apache.org/jira/browse/NUTCH-3099, it
   
   - formats the patch (removes additional whitespace from lines, 2-space 
indents)
   - adds license header to new `TestHttpBase.java`
   - added missing imports to `TestHttpBase.java`
   
   The reason I created this PR is that I wanted to exercise the JUnit test CI.
   
   I think we should also update the following description in 
`nutch-default.xml`
   
   ```
   <property>
     <name>http.proxy.exception.list</name>
     <value></value>
     <description>A comma separated list of hosts that don't use the proxy
     (e.g. intranets). Example: www.apache.org</description>
   </property>
   ```
   
   Maybe something like
   
   ```
   <property>
     <name>http.proxy.exception.list</name>
     <value></value>
     <description>Either i) a comma separated list of hosts e.g., 
domain1.org,www.domain2.com 
     or ii) a wildcard '*' in either prefix e.g. "*.domain.com", or suffix e.g. 
"some.domain.*", that don't 
     use the proxy (e.g. intranets)</description>
   </property>
   ```
   Any thoughts?
   




> Allow wildcard '*' in http.proxy.exception.list
> -----------------------------------------------
>
>                 Key: NUTCH-3099
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3099
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: 1.20
>            Reporter: Isabelle Giguere
>            Assignee: Isabelle Giguere
>            Priority: Major
>             Fix For: 1.22
>
>         Attachments: NUTCH-3099.2025-10-08.patch.txt, 
> NUTCH-3099.2025-10-09.patch.txt
>
>
> The Nutch setting "http.proxy.exception.list" should accept the '*' wildcards.
> The equivalent JVM property "http.nonProxyHosts" does allow '*' at the start 
> or end of a host name.
> https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html
> Note that starting Nutch with -Dhttp.nonProxyHosts="some.host" has no effect, 
> crawling goes through the proxy anyways.  Only "http.proxy.exception.list" 
> can be used with Nutch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to