Your message dated Sun, 02 Mar 2008 17:00:16 +0100
with message-id <[EMAIL PROTECTED]>
and subject line fixed since some time
has caused the Debian Bug report #436566,
regarding lftp: Does not correctly encode UTF-8 symbols in URL.
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [EMAIL PROTECTED]
immediately.)
--
436566: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=436566
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems
--- Begin Message ---
Package: lftp
Version: 3.5.6-1
Severity: important
Tags: patch
UTF-8 multi-byte characters are not correctly encoded into URLs. These
characters are for example vowels with accents, and thus appear very
frequently in european languages like French (which is my own).
Although UTF-8 encoded web pages are not widespread yet, I believe it is
a good practice to encourage unicode. Here is an example website which
fails with lftp :
> $ lftp http://files.iai.heig-vd.ch/Enseignement/ <<EOF
> cd Supports%20de%20cours/Acquisition\ de\ données\ \&\ CEM/
> EOF
Here is the output I get :
> $ lftp http://files.iai.heig-vd.ch/Enseignement/Supports%20de%20cours/
> cd ok, cwd=/Enseignement/Supports de cours
> lftp files.iai.heig-vd.ch:/Enseignement/Supports de cours>
> cd Acquisition\ de\ données\ \&\ CEM/
> cd: Access failed: 404 Not Found (/Enseignement/Supports de
> cours/Acquisition de données & CEM)
> lftp files.iai.heig-vd.ch:/Enseignement/Supports de cours> exit
I wrote a naïve patch to url-encode some of these characters and it
seems to work for the example page, but it still misses most UTF-8
characters. While I figure out how to do it correctly maybe you can
point me to some relevant information or to upstream coders which
would be interested ?
Oh and thank you for maintaining this =)
--
billitch
-- System Information:
Debian Release: 4.0
APT prefers stable
APT policy: (990, 'stable'), (500, 'testing')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Versions of packages lftp depends on:
ii libc6 2.3.6.ds1-13 GNU C Library: Shared libraries
ii libexpat1 1.95.8-3.4 XML parsing C library - runtime li
ii libgcc1 1:4.1.1-21 GCC support library
ii libgcrypt11 1.2.3-2 LGPL Crypto library - runtime libr
ii libgnutls13 1.4.4-3 the GNU TLS library - runtime libr
ii libgpg-error0 1.4-1 library for common error values an
ii libncurses5 5.5-5 Shared libraries for terminal hand
ii libreadline5 5.2-2 GNU readline and history libraries
ii libtasn1-3 0.3.6-2 Manage ASN.1 structures (runtime)
ii netbase 4.29 Basic TCP/IP networking system
ii zlib1g 1:1.2.3-13 compression library - runtime
lftp recommends no packages.
-- no debconf information
diff -ur lftp-3.5.6.orig/src/url.cc lftp-3.5.6/src/url.cc
--- lftp-3.5.6.orig/src/url.cc 2006-02-06 11:59:59.000000000 +0100
+++ lftp-3.5.6/src/url.cc 2007-08-08 08:08:37.000000000 +0200
@@ -441,6 +441,7 @@
/* Encodes the unsafe characters (listed in URL_UNSAFE) in a given
string, returning a malloc-ed %XX encoded string. */
+inline char *cat_quoted (char *p, const unsigned char c);
#define need_quote(c) (!unsafe || iscntrl((unsigned char)(c)) || strchr(unsafe,(c)))
char *url::encode_string (const char *s,char *res,const char *unsafe)
{
@@ -462,10 +463,12 @@
{
if (need_quote(*s))
{
- const unsigned char c = *s;
- *p++ = '%';
- sprintf(p,"%02X",c);
- p+=2;
+ p = cat_quoted (p, *s);
+ if ((unsigned char) *s == 0xC3 && s[1])
+ {
+ s++;
+ p = cat_quoted (p, *s);
+ }
}
else
*p++ = *s;
@@ -474,6 +477,14 @@
return res;
}
+inline char *cat_quoted (char *p, const unsigned char c)
+{
+ *p++ = '%';
+ sprintf(p,"%02X",c);
+ p+=2;
+ return p;
+}
+
bool url::dir_needs_trailing_slash(const char *proto)
{
if(!proto)
diff -ur lftp-3.5.6.orig/src/url.h lftp-3.5.6/src/url.h
--- lftp-3.5.6.orig/src/url.h 2006-02-06 12:00:06.000000000 +0100
+++ lftp-3.5.6/src/url.h 2007-08-08 08:04:48.000000000 +0200
@@ -47,7 +47,7 @@
char *Combine(const char *home=0,bool use_rfc1738=true);
};
-# define URL_UNSAFE " <>\"%{}|\\^[]`"
+# define URL_UNSAFE " <>\"%{}|\\^[]`\xC3"
# define URL_PATH_UNSAFE URL_UNSAFE"#;?"
# define URL_HOST_UNSAFE URL_UNSAFE":/"
# define URL_PORT_UNSAFE URL_UNSAFE"/"
--- End Message ---
--- Begin Message ---
Hello,
this bug is fixed since some month.
I'm closing this bug.
Thanks for reporting.
--
Noèl Köthe <noel debian.org>
Debian GNU/Linux, www.debian.org
signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
--- End Message ---