Hi,

This is something we can look in to.  I believe we already do something
similar with filenames that are discovered via a list operation in a
recursive/wildcard transfer.  If you do a 'globus-url-copy
gsiftp://hostname/path/start-of-utf8-filename*', it may work.  Can you
open an enhancement bug report under GridFTP at http://bugzilla.globus.org?

Thanks,
Mike

Hai-Ning Wu wrote:
> Hello,
> 
> Since gridftp is developed according to RFC, I found out that such 
> encoding problems can be easily removed by converting the input strings 
> into percent-encoding. (http://en.wikipedia.org/wiki/Percent-encoding)
> Thus I wrote a function that converts the non-ascii characters to 
> percent-encoding. Once the "globus-url-copy" calls this function before 
> going deeper, it is able to handle utf8 strings.
> I am just wondering if the gridftp developers can take this into 
> consideration because there are only a few lines of code which makes the 
> multilingual world much easier.
> 
> Below is my code:
> char *
> globus_l_guc_convert_utf8_url(char * origin_url)
> {
>    char * ascii_only_url;
>    char hex[17] = "0123456789ABCDEF";
>    int pos1, pos2;
> 
>    ascii_only_url = (char *) malloc(strlen(origin_url) * 2 * sizeof(char));
>    pos1 = pos2 = 0;
>    while (origin_url[pos1] != '\0') {
>        if (origin_url[pos1] >= 0)
>            ascii_only_url[pos2++] = origin_url[pos1++];
>        else {
>            ascii_only_url[pos2++] = '%';
>            ascii_only_url[pos2++] = hex[(unsigned 
> char)origin_url[pos1]/16];
>            ascii_only_url[pos2++] = hex[(unsigned 
> char)origin_url[pos1]%16];
>            pos1++;
>        }
>    }
>    ascii_only_url[pos2] = '\0';
>    return ascii_only_url;
> }
> 
> 
> - Hai-Ning
> 
> ----- Original Message ----- From: "Dan Gunter" <[EMAIL PROTECTED]>
> To: "Hai-Ning Wu" <[EMAIL PROTECTED]>
> Cc: <[email protected]>
> Sent: Tuesday, May 06, 2008 9:29 PM
> Subject: Re: [gt-user] Unable to transfer file names encoded in UTF8 
> using GridFTP
> 
> 
>> The reason gridftp is picky is that URLs can only have US-ASCII
>> characters in them, and it doesn't want to do the encoding for you
>> because, I assume, that would be a fair amount of work. See RFC 1738,
>> 1808, and 2396 for details if you are interested in tackling this
>> yourself. One approach may be to simply wrap the guc command with an
>> encoder.
>>
>> -Dan
>>
>> Hai-Ning Wu wrote:
>>> Hello,
>>>
>>> I tried to transfer files with Chinese file names using
>>> "globus-url-copy" but failed to do so. The error message is "error:
>>> [globus_gass_copy_get_url_mode]: globus_url_parse returned error code:
>>> -8 for url: <my file path>"
>>>
>>> To see which part went wrong, I traced the source code of
>>> globus-url-copy and, finally,  I found out that the problem came from
>>>     $gt_home/source-trees/common/source/library/globus_url.c.
>>>
>>> This is the a small piece of the code from globus_url_get_path() where
>>> the problem occurs:
>>>       if(isalnum((*stringp)[pos]) ||
>>>           globusl_url_issafe((*stringp)[pos]) ||
>>>           globusl_url_isextra((*stringp)[pos]) ||
>>>           globusl_url_isscheme_special((*stringp)[pos]) ||
>>>           (*stringp)[pos] == '~' || /* incorrect, but de facto */
>>>           (*stringp)[pos] == '/'||
>>>           (*stringp)[pos] == ' ') /* to be nice */
>>>        {
>>>            pos++;
>>>        }
>>>
>>> The function "globus_url_get_path()" checks the validity of the path
>>> before retrieving its substring. It only accepts ASCII characters and
>>> omits any other characters. However, since Chinese characters are
>>> encoded in UTF-8 and most UTF-8 characterss are begin with a "1" as
>>> their leading bits. This is why Chinese file names did not work with
>>> globus-url-copy.
>>>
>>> I cannot understand the exact function of the code above. I mean it
>>> seems ok to work with characters other than ASCII codes. So I am just
>>> wondering if it is appropriate to let that function accept them, in
>>> order to accept UTF-8 strings.
>>>
>>> By the way, I think it is important to make grid middlewares like
>>> globus to support multiple languages since grid computing requires
>>> global cooperation. For example, if developers consider not just ASCII
>>> code or program in unicode, the life would have been much easier.
>>> However, as far as I have experienced, most programs are lack of
>>> multi-language features.
>>>
>>> Any comments would be helpful. Thanks.
>>>
>>> Hai-Ning
>>>
>>> -- 
>>> Hai-Ning Wu
>>> Academia Sinica Grid Computing
>>> Taipei, Taiwan
>>> Email: [EMAIL PROTECTED]
>>>
>>
>>
>> -- 
>> Dan Gunter. voice:510-495-2504 fax:510-486-6363 dsd.lbl.gov/~dang
>>
>>
> 

Reply via email to