Hello,
On Thu, 14 Jan 2021, Walter Dnes wrote:
> I'm bored, so I do a regular daily report at the DSL Reports "CanChat"
>sub-forum, on the Covid-19 case counts for Ontario, using provincial
>data. I download 2 files daily as source data. One of them is a PDF
>file, which is run through "pdftotext" and then parsed by a bash script
>(don't ask). Today, the command...
>
> wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
>
>...returns a zero-byte file. *BUT*, sticking the URL into the URL bar
>of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the
>PDF file just fine. Is "wget" being blocked?
[..]
> I've tried setting --user-agent= with my browser's string as shown by
>https://www.whatismybrowser.com/detect/what-is-my-user-agent but no
>luck. Is there some way to get around this? I have not updated this
>past week, so I don't think the problem is at my end.
I could download that file just fine just now[1]. Try running 'wget'
with the '-S' option. Oh and:
[..]
WARNING: cannot verify files.ontario.ca's certificate, issued by
[..]
If you sent stderr to /dev/null ...
So, try:
wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
BTW: you know that you can let date format that URL? e.g.:
wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
"$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')"
There just are no unescaped '%' allowed besides the format strings for
the date/time. So if an URL contains one, you need to escape those
with another '%', as in e.g.
$(date '+foo%%20bar-%Y-%m-%d.pdf')
^^ this fella
In your case, the URL is clean ;)
HTH,
-dnh
[1] $ TZ=America/Toronto date
Thu Jan 14 16:50:15 EST 2021
--
"Airplane travel is nature's way of making you look like your passport
photo." -- Al Gore