Hi all

it's just one example about this

when I wanted to mirror one wordpress-based site (http://media-mera.ru) I have 
noticed that the process takes too much time
I have found that the reason is "reply to" links which are provided for every 
user comment:
http://media-mera.ru/articles/socially_useful?replytocom=6192#respond
http://media-mera.ru/articles/socially_useful?replytocom=6194#respond
http://media-mera.ru/articles/socially_useful?replytocom=6358#respond
etc
actually these addreses is just the same page 
http://media-mera.ru/articles/socially_useful so I have added -R to the
command line
-R '*replytocom=*'
but nothing changed and after some googling I have found the note about wget 
behaviour:
-------
http://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html#Types-of-Files
..
Note that these two options do not affect the downloading of html files (as 
determined by a ‘.htm’ or ‘.html’ filename
prefix). This behavior may not be desirable for all users, and may be changed 
for future versions of Wget.
..
-------
yes, I absolutely agree, it should be changed, judged by wget output the total 
downloaded traffic exceeds resulted
saved mirror in 10 times!

PS
wget is running on this site 30 minutes, httrack - only 1,5

PPS
while was writing, I have found even special wordpress plugin which is intended 
to reduce traffic of "replytocom" links - 
http://wordpress.org/extend/plugins/replytocom-redirector/

-- 
with best regards
Dmitry Bolshakov

Reply via email to