Yes! Multiplexing was indeed partially the culprit, I've changed it to --http2-request-window=5
However the download queue (AKA 'Todo') still gets enormous. It's why I was wanting to use non-verbose mode in the first place, screens and screens of 'Adding url:'. There should really be a limit on how many urls it adds! Darshit, as it stands it doesn't look like --force-progress does anything because --progress=bar forces the same non-verbose mode, and --force-progress is meant to be something used in non-verbose mode. However, the progress bar is still really... not useful. See here: https://i.imgur.com/KvbGmKe.png It's a single bar displaying a nonsense percentage, and it sounds like with multiplexing there's supposed to be, by default, 30 transfers going concurrently. > Both reduce RTT by 1, but they can't be combined. I was using TLS Resume because, well, for a 300+GB download it just seemed to make sense, so it wouldn't have to check over 100GB of files before getting back to where I left off. > You use TLS Resume, but you don't explicitly need to specify a file. By default it will use ~/.wget-session. I figure a 300GB+ transfer should have its own session file just in case I do something smaller between resumes that might overwrite .wget-session, plus you've got to remember I'm on WSL and I'd rather have relevant files kept within my normal folders rather than my WSL filesystem. On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <dar...@gmail.com> wrote: > Hi Jefferey, > > Thanks a lot for your feedback. This is what helps us improve. > > * Tim Rühsen <tim.rueh...@gmx.de> [180407 00:01]: > > > > On 06.04.2018 23:30, Jeffrey Fetterman wrote: > > > Thanks to the fix that Tim posted on gitlab, I've got wget2 running > just > > > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but > given > > > how fast it's downloading a ton of files at once, it seems like it > must've > > > been only a small gain. > > > > TCP Fast Open will not save you a lot in your particular scenario. It > simply > saves one round trip when opening a new connection. So, if you're using > Wget2 > to download a lot of files, you are probably only opening ~5 connections > at the > beginning and reusing them all. It depends on your RTT to the server, but > 1 RTT > when downloading several megabytes is already an insignificant amount if > time. > > > > > > > I've come across a few annoyances however. > > > > > > 1. There doesn't seem to be any way to control the size of the download > > > queue, which I dislike because I want to download a lot of large files > at > > > once and I wish it'd just focus on a few at a time, rather than over a > > > dozen. > > The number of parallel downloads ? --max-threads=n > > I don't think he meant --max-threads. Given how he is using HTTP/2, > there's a > chance what he's seeing is HTTP Stream Multiplexing. There is also, > `--http2-request-window` which you can try. > > > > > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: > Broken > > > pipe) error to be thrown', seems to be related to how certificate > > > verification is handled upon resume, but I was worried at first that > the > > > WLS problems were rearing their ugly head again. > > Likely the WSL issue is also affecting the TLS layer. TLS resume is > > considered 'insecure', > > thus we have it disabled by default. There still is TLS False Start > > enabled by default. > > > > > > > 3. --no-check-certificate causes significantly more errors about how > the > > > certificate issuer isn't trusted to be thrown (even though it's not > > > supposed to be doing anything related to certificates). > > Maybe a bit too verbose - these should be warnings, not errors. > > @Tim: I thunk with `--no-check-certificate` these should not be either > warnings > or errors. The user explicitly stated that they don't care about the > validity > of the certificate. Why add any information there at all? Maybe we keep it > only > in debug mode > > > > > 4. --force-progress doesn't seem to do anything despite being > recognized as > > > a valid paramater, using it in conjunction with -nv is no longer > beneficial. > > You likely want to use --progress=bar. --force-progress is to enable the > > progress bar even when redirecting (e.g. to a log file). > > @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x. > > I think the progress bar options are sometimes a little off since we don't > have > tests for those and I am the only one using them. > > When exactly did you try to use --force-progress? I will change the > documentation today to reflect its actual usecase. --force-progress is > useful > only in --quiet mode. Which, TBH, doesn't make much sense to me since > simply > --progress=bar will essentially put you in the same mode. AFAIR, this comes > from trying to bring in option compatibility from Wget 1.x. > > @Tim: Adjusting behaviour to the same as Wget 1.x doesn't make a lot of > sense > for the progress bar. In Wget 1.x, the default mode is: progress bar + > verbose. > Whereas, in Wget2, progress-bar will effectively enable the non-verbose > mode > where only warnings and errors are printed. I am noting this down for now. > When > I have a little time, I will think about all the progress and verbosity > options > in Wget 1.x and make sure that they do something similar in Wget2. Though, > they > won't have the exact same behaviour. > > > > > 5. The documentation is unclear as to how to disable things that are > > > enabled by default. Am I to assume that --robots=off is equivalent to > -e > > > robots=off? > > > > -e robots=off should still work. We also allow --robots=off or > --no-robots. > > > > > 6. The documentation doesn't document being able to use 'M' for > chunk-size, > > > e.g. --chunk-size=2M > > > > The wget2 documentation has to be brushed up - one of the blockers for > > the first release. > > > > > > > > 7. The documentation's instructions regarding --progress is all wrong. > > I'll take a look the next days. > > Thanks for the heads up. Will look into it when I look at the rest of the > progress options. > > > > > > > > 8. The http/https proxy options return as unknown options despite > being in > > > the documentation. > > Yeah, the docs... see above. Also, proxy support is currently limited. > > > > > > > Lastly I'd like someone to look at the command I've come up with and > offer > > > me critiques (and perhaps help me address some of the remarks above if > > > possible). > > > > No need for --continue. > > Think about using TLS Session Resumption. > > --domains is not needed in your example. > > > > You use TLS Resume, but you don't explicitly need to specify a file. By > default > it will use ~/.wget-session. > > > Did you build with http/2 and compression support ? > > > > Regards, Tim > > > #!/bin/bash > > > > > > wget2 \ > > > `#WSL compatibility` \ > > > --restrict-file-names=windows --no-tcp-fastopen \ > > > \ > > > `#No certificate checking` \ > > > --no-check-certificate \ > > > \ > > > `#Scrape the whole site` \ > > > --continue --mirror --adjust-extension \ > > > \ > > > `#Local viewing` \ > > > --convert-links --backup-converted \ > > > \ > > > `#Efficient resuming` \ > > > --tls-resume --tls-session-file=.\tls.session \ > > > \ > > > `#Chunk-based downloading` \ > > > --chunk-size=2M \ > > > \ > > > `#Swiper no swiping` \ > > > --robots=off --random-wait \ > > > \ > > > `#Target` \ > > > --domains=example.com example.com > > > > > > > > > > > -- > Thanking You, > Darshit Shah > PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6 >