When I try "harvesting" many sites, I see wget trying URI's like:
   [1]http://the.target.host/some/subdirectory/index.html?source=navbar,se
   arch=search%20term
   And some that contain complex query strings making a URI that is way
   over the maximum path length.
   From Google, I see URI's where '@' is used rather than '?'.
   In both cases, the content returned is dynamic, dependent on the
   query.
   IMNSHO, wget should simply leave such links in their original page
   source and not try to retrieve them at all.  I think that should be the
   default case, but it would be OK as an option.

References

   1. 
http://the.target.host/some/subdirectory/index.html?source=navbar,search=search%20term

Reply via email to