Hello everyone,

This is not a bug, but it something I think should probably be developed as
it does not seem that difficult to accomplish.  I was about to give the
patch a go myself but upon looking at the source code I thought it may be
smarter to simply suggest this idea and see if it is possible or easy to do.

Basically there are sites out there, specifically vb forums that require the
referrer to actually be the page that you came from! (imagine that)!  My
project is to mirror an entire vb forum and I got pretty far along doing
it.  Storing cookies, simulating post logins, everything, and after many
hours I finally got in and am able to do it, but there is a problem.  On the
majority of the pages, if the referrer is not set; in the .wgetrc file as
referer = http://somepage.com the forum kicks the page to the log in screen,
and what I am left with is hundreds of pages that are all 15 kb and are just
the log in screen of the forum!

Now, if I manually change the referer to a certain directory within the
domain, I can see the page instead of a log-in page, but when I try to
follow those links and save them, it throws me back to the log in screen.
After many hours of tedious and careful study, I realized that when I
changed the referrer manually, I was able to see the page I couldn't see
before, but only in that directory, the second I tried to traverse one
directory deep, it would kick me out because referrer was then wrong.  I
studied the headers with live http headers and sure enough the referrer
variable is changing around so I assume their vb software is programmed to
pick it up and check it with every page load!

So, my question, or comment or statement is, how hard would it be to
implement a switch, say for example --recursive-referrer and when this
switch is used, wget will actively change the 'referer' value to whatever
page it just previously came from whilst traversing through all directories,
enabling full mirroring of sites that check the referrer variable and if it
is wrong kicks you out (in this case vb forums).

Thanks a lot for the otherwise great program, and I hope I atiquietly
described what the problem I ran into was!

I was wondering if there was a quickie fix, such as piping the header output
to a file, using perl or something to grab the referrer out of it, and then
piping that back into the next wget execution? (seems hoge-poge to me and
probably not a good solution)

Reply via email to