Update of bug #50383 (project wget):

                  Status:                    None => Confirmed              

    _______________________________________________________

Follow-up Comment #1:

Two problems here:

1. the command-line URL is converted by 'remote_to_utf8()' ind url_parse().
This is wrong, locale_to_utf8() must be taken.
On many locales, this wouldn't make a difference with tilde, but I just
recognized it when tracing wget.

2. After dequeing (before download), wget converts the complete URL with
remote_to_utf8(). This is wrong - only the part coming from remote should be
converted (~foo came from local input).

Suggested fix:
The charset conversion to utf-8 should take place whenever input is taken
(from command line or from remote). Internally, wget should work with utf-8
only. That is what Wget2 already does.

I add my Python test script to reproduce this issue, if someone wants to work
on it. Copy it to testenv/ and manually start it or add it to Makefile.am.

(file #39814)
    _______________________________________________________

Additional Item Attachment:

File name: Test-link-shiftjis.py          Size:1 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50383>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/


Reply via email to