I'm using wget to make a frozen, offline mirror of a wordpress.com site. The original HTML makes extensive use of <img srcset=...> (responsive design for different browser resolutions. wget is corrupting the comma-separated lists of images.
e.g. wget --page-requisites --span-hosts https://theliteratelens.com/ downloads a set of files including theliteratelens.com/index.html which includes the following element as the first instance of srcset (line breaks inserted by me and irrelevant fields omitted): <img width="350" height="248" src=" https://theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=350&h=248&crop=1 " class="attachment-suburbia-sticky size-suburbia-sticky wp-post-image" alt="" loading="lazy" srcset=" https://theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=350&h=248&crop=1 350w, https://theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=150&h=106&crop=1 150w, https://theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=300&h=212&crop=1 300w" sizes="(max-width: 350px) 100vw, 350px" ... /> Note the srcset field with 3 versions of the image referenced whose decoded URL tails look like "realistfrontcover_small.jpg?w=150&h=248&crop=1" However, if I add --convert-links, e.g. wget --page-requisites --span-hosts --convert-links https://theliteratelens.com/ the same element in theliteratelens.com/index.html becomes: <img width="350" height="248" src="../ theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=350&h=248&crop=1 " class="attachment-suburbia-sticky size-suburbia-sticky wp-post-image" alt="" loading="lazy" srcset="../ theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=350&h=248&crop=1p;crop=../theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=150&h=106&crop=1h=106&a../theliteratelens.files.wordpress.com/2017/12/realistfrontcover_small.jpg?w=300&h=212&crop=1300&h=212&crop=1 300w" sizes="(max-width: 350px) 100vw, 350px" ... /> i.e. the comma-separated list in the srcset has been badly corrupted. For instance, the end of the first path, which was originally ...h=248&crop=1 350w, https:// theliteratelens.files.wordpress.com/2017/12... becomes ...h=248&crop=1p;crop=../theliteratelens.files.wordpress.com/2017/12. .. and the second boundary between elements starts as ...h=106&crop=1 150w, https://theliteratelens.files... but ends up as ...h=106&crop=1h=106&a../theliteratelens.files... What seems to be happening is that the convert-links logic is finding the absolute URLs to the second host ( https://theliteratelens.files.wordpress.com) and correctly maps them to relative paths (../theliteratelens.files.wordpress.com/), but at the same time it reaches back one space-delimiter too far, and replaces those characters with a spurious sample from the preceding string. I hope this helps identify the problem. DAn.
