HiranChaudhuri commented on PR #845:
URL: https://github.com/apache/nutch/pull/845#issuecomment-2522247755

   Does it make sense to decide stripping authority data based on the protocol? 
I acknowledge most users want to scan the internet anonymously. But intranets 
or users interested to index 'their' content, be it on local or remote servers 
will need authority data to be preserved while they have no control over the 
protocol. Thus I suspect sometimes it may be required even though https is used.
   
   How about making it configurable, maybe via regexp? This would allow Nutch 
users to define the protocol, or the site or ... where to preserve the 
authority.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to