Hello,
Since gridftp is developed according to RFC, I found out that such encoding
problems can be easily removed by converting the input strings into
percent-encoding. (http://en.wikipedia.org/wiki/Percent-encoding)
Thus I wrote a function that converts the non-ascii characters to
percent-encoding. Once the "globus-url-copy" calls this function before
going deeper, it is able to handle utf8 strings.
I am just wondering if the gridftp developers can take this into
consideration because there are only a few lines of code which makes the
multilingual world much easier.
Below is my code:
char *
globus_l_guc_convert_utf8_url(char * origin_url)
{
char * ascii_only_url;
char hex[17] = "0123456789ABCDEF";
int pos1, pos2;
ascii_only_url = (char *) malloc(strlen(origin_url) * 2 * sizeof(char));
pos1 = pos2 = 0;
while (origin_url[pos1] != '\0') {
if (origin_url[pos1] >= 0)
ascii_only_url[pos2++] = origin_url[pos1++];
else {
ascii_only_url[pos2++] = '%';
ascii_only_url[pos2++] = hex[(unsigned
char)origin_url[pos1]/16];
ascii_only_url[pos2++] = hex[(unsigned
char)origin_url[pos1]%16];
pos1++;
}
}
ascii_only_url[pos2] = '\0';
return ascii_only_url;
}
- Hai-Ning
----- Original Message -----
From: "Dan Gunter" <[EMAIL PROTECTED]>
To: "Hai-Ning Wu" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Sent: Tuesday, May 06, 2008 9:29 PM
Subject: Re: [gt-user] Unable to transfer file names encoded in UTF8 using
GridFTP
The reason gridftp is picky is that URLs can only have US-ASCII
characters in them, and it doesn't want to do the encoding for you
because, I assume, that would be a fair amount of work. See RFC 1738,
1808, and 2396 for details if you are interested in tackling this
yourself. One approach may be to simply wrap the guc command with an
encoder.
-Dan
Hai-Ning Wu wrote:
Hello,
I tried to transfer files with Chinese file names using
"globus-url-copy" but failed to do so. The error message is "error:
[globus_gass_copy_get_url_mode]: globus_url_parse returned error code:
-8 for url: <my file path>"
To see which part went wrong, I traced the source code of
globus-url-copy and, finally, I found out that the problem came from
$gt_home/source-trees/common/source/library/globus_url.c.
This is the a small piece of the code from globus_url_get_path() where
the problem occurs:
if(isalnum((*stringp)[pos]) ||
globusl_url_issafe((*stringp)[pos]) ||
globusl_url_isextra((*stringp)[pos]) ||
globusl_url_isscheme_special((*stringp)[pos]) ||
(*stringp)[pos] == '~' || /* incorrect, but de facto */
(*stringp)[pos] == '/'||
(*stringp)[pos] == ' ') /* to be nice */
{
pos++;
}
The function "globus_url_get_path()" checks the validity of the path
before retrieving its substring. It only accepts ASCII characters and
omits any other characters. However, since Chinese characters are
encoded in UTF-8 and most UTF-8 characterss are begin with a "1" as
their leading bits. This is why Chinese file names did not work with
globus-url-copy.
I cannot understand the exact function of the code above. I mean it
seems ok to work with characters other than ASCII codes. So I am just
wondering if it is appropriate to let that function accept them, in
order to accept UTF-8 strings.
By the way, I think it is important to make grid middlewares like
globus to support multiple languages since grid computing requires
global cooperation. For example, if developers consider not just ASCII
code or program in unicode, the life would have been much easier.
However, as far as I have experienced, most programs are lack of
multi-language features.
Any comments would be helpful. Thanks.
Hai-Ning
--
Hai-Ning Wu
Academia Sinica Grid Computing
Taipei, Taiwan
Email: [EMAIL PROTECTED]
--
Dan Gunter. voice:510-495-2504 fax:510-486-6363 dsd.lbl.gov/~dang