-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Todd Pattist wrote: > I'm having trouble understanding how accept and reject work, > particularly in the context of sites that rely on CGI and PHP to > dynamically generate html pages. My questions relate to the following: > > 1) I don't fully understand the -A and -R effects and the difference, if > any, between what links are traversed and parsed for deeper links, > versus what files are kept and stored locally. The docs seem to say > that -A and -R have no effect on the link traverse for html files, but > this doesn't seem true for dynamically generated CGI, PHP files.
This "doesn't affect traversal of HTML files" functionality is currently implemented via a heuristic based on the filename extension. That is, if it ends in ".htm" or ".html", I believe, then it will be traversed regardless of -A or -R settings, whereas .cgi or .php will not affect traversal. I'd have to look at the relevant code, but it's possible that "directory"-looking names may also be automatically traversed in that way. > Does > html_extension=on affect link traversal? No; this only affects whether filenames are changed upon download to explicitly include an ".html" extension (useful for local browsing). > I'd like to be able to > independently control link traversal vs. file retrieval with local file > storage. Do the directory include/exclude commands allow this - do they > work differently from -A -R? I'm afraid I'm unsure what you are asking here. > 2) The logs seem to show PHP files being retrieved and then not saved. > When mirroring a forum, you often want to exclude links that do a > logout, or subscribe you to a topic. Does -R prevent a dynamically > generated html page from a PHP link from being traversed? I think I'd need to see an example log of files "being retrieved and then not saved", to understand what you mean. > 3) Which has priority if both reject and accept filters match? Not sure; it's easy enough to test this yourself, though. > 4) Sometimes the OS restricts filename characters. Do the -A and -R > filters match on the final name used to store the file, or on the name > at the server? They should match the server's name (which includes the Content-Disposition name, if that's being used); however, there were at least some situations where the local name was being matched (there was the case when -nd was being used, at least); I can't recall whether that was resolved yet, I'm guessing not. Please feel free to report any other cases you encounter, where local transformations result in erroneous matches from -A/-R. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer... http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH1Xop7M8hyUobTrERAjRSAJ4o5RsliyGZ52mRTeuS75e8oR/lYACgg0DU KFDXK8QMOJI2NLJqAK+HDP0= =uP/C -----END PGP SIGNATURE-----