Hello,

On Thu, 14 Jan 2021, Walter Dnes wrote:
>  I'm bored, so I do a regular daily report at the DSL Reports "CanChat"
>sub-forum, on the Covid-19 case counts for Ontario, using provincial
>data.  I download 2 files daily as source data.  One of them is a PDF
>file, which is run through "pdftotext" and then parsed by a bash script
>(don't ask).  Today, the command...
>
>  wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
>
>...returns a zero-byte file.  *BUT*, sticking the URL into the URL bar
>of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the
>PDF file just fine.  Is "wget" being blocked?
[..]
>  I've tried setting --user-agent= with my browser's string as shown by
>https://www.whatismybrowser.com/detect/what-is-my-user-agent  but no
>luck.  Is there some way to get around this?  I have not updated this
>past week, so I don't think the problem is at my end.

I could download that file just fine just now[1]. Try running 'wget'
with the '-S' option. Oh and:

[..]
WARNING: cannot verify files.ontario.ca's certificate, issued by
[..]

If you sent stderr to /dev/null ...

So, try:

    wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
        https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf

BTW: you know that you can let date format that URL? e.g.:

    wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
      "$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')"

There just are no unescaped '%' allowed besides the format strings for
the date/time. So if an URL contains one, you need to escape those
with another '%', as in e.g.
    $(date '+foo%%20bar-%Y-%m-%d.pdf')
                ^^ this fella

In your case, the URL is clean ;)

HTH,
-dnh

[1] $ TZ=America/Toronto date
    Thu Jan 14 16:50:15 EST 2021

-- 
"Airplane travel is nature's way of making you look like your passport
photo."                                                     -- Al Gore

Reply via email to