Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie: http://wiki.apache.org/nutch/org%2eapache%2enutch%2enet%2eBasicUrlNormalizer The comment on the change is: adding page New page: = BasicUrlNormalizer Notes = The Basic URL Normalizer class manipulates an URL in several ways. 1. Trims white space from the end of the URL. (java.lang.String.trim()) 1. may lower case protocol. (java.net.URL) 1. if protocol is http or ftp: a. lower cases host. a. removes port if default. a. adds trailing slash if no file specified. a. removes any refrence text a. removes any relative paths For example:[[BR]] {{{http://wiKI.apache.ORG:80/somedirectory/../DevelopmentCommandLineOptions}}}[[BR]] would be rewriten:[[BR]] {{{http://wiki.apache.org/DevelopmentCommandLineOptions}}}[[BR]] == Notes == Other then trimming trailing white space and the normalization performed by java.net.URL no protocols other then http and ftp are further normalized.
