Update of bug #47689 (project wget):

                  Status:                    None => Confirmed              

    _______________________________________________________

Follow-up Comment #2:

Downloading works, but this issue is about content parsing (recursive
downloads).

The server does not state a content-encoding, but the document (index.html)
contains a BOM (Byte Order Mark) that says it is UTF-16LE encoded.

What has to be done is to convert the "input-stream" into UTF-8 (that's what
wget is able to work with).

Currently we assume input data usable with traditional C string functions.
UTF-16 (Unicode) can't be used with traditional C string functions.

See https://html.spec.whatwg.org/multipage/syntax.html#the-input-byte-stream

@Eli After downloading, try
$ wget -d -r --local-encoding=UTF-16LE --input-file index.html --force-html
--base http://www.free-energy-info.co.uk


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?47689>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


Reply via email to