[Bug-wget] Handling query-strings and over-long URI's

David A. Cobb Mon, 19 Aug 2013 13:47:05 -0700

   When I try "harvesting" many sites, I see wget trying URI's like:
   [1]http://the.target.host/some/subdirectory/index.html?source=navbar,se
   arch=search%20term
   And some that contain complex query strings making a URI that is way
   over the maximum path length.
   From Google, I see URI's where '@' is used rather than '?'.
   In both cases, the content returned is dynamic, dependent on the
   query.
   IMNSHO, wget should simply leave such links in their original page
   source and not try to retrieve them at all.  I think that should be the
   default case, but it would be OK as an option.


References

   1. 
http://the.target.host/some/subdirectory/index.html?source=navbar,search=search%20term

[Bug-wget] Handling query-strings and over-long URI's

Reply via email to