When I try "harvesting" many sites, I see wget trying URI's like: [1]http://the.target.host/some/subdirectory/index.html?source=navbar,se arch=search%20term And some that contain complex query strings making a URI that is way over the maximum path length. From Google, I see URI's where '@' is used rather than '?'. In both cases, the content returned is dynamic, dependent on the query. IMNSHO, wget should simply leave such links in their original page source and not try to retrieve them at all. I think that should be the default case, but it would be OK as an option.
References 1. http://the.target.host/some/subdirectory/index.html?source=navbar,search=search%20term
