On Sep 9, 2010, at 1:32, Maxwell, Adam R wrote:

> On Sep 8, 2010, at 10:40, Christiaan Hofman wrote:
> 
>> On Sep 8, 2010, at 19:27, Christiaan Hofman wrote:
>>> 
>>> 1. Apparently Apple has problems with the fragment separator in the URL 
>>> (#). Looking at our code, this used to work in the past because we 
>>> explicitly don't escape those, but somehow Apple has decided that it 
>>> doesn't like them, though it's a valid URI character. I don't know how that 
>>> should be handled, as escaping them would also be wrong.
>> 
>> Correction, it does not seem to be related to the "#" character. It is 
>> really a problem in the way we try to escape special characters. Though I 
>> don't see why that actually should give a problem, because in the end we do 
>> escape them. Anyway, this should be fixed in tomorrow's nightly.
> 
> Unfortunately, this URL has multiple encoded # characters and one that's not 
> supposed to be encoded, which is gross.  The real problem that the escaping 
> code is trying to solve is in dealing with partially-escaped URL strings, so 
> the call to CFURLCreateStringByReplacingPercentEscapes needs to be restored.  
> I'm pretty sure I filed a bug on this with Apple years ago, and the 
> documentation now includes an example doing exactly this with a DOI.
> 
> Here's a version that works with Themis' URL, although I haven't tested it 
> extensively:
> 
> + (NSURL *)URLWithStringByNormalizingPercentEscapes:(NSString *)string 
> baseURL:(NSURL *)baseURL;
> {
>    CFStringRef urlString = (CFStringRef)string;
> 
>    if(BDIsEmptyString(urlString))
>       return nil;
> 
>    CFAllocatorRef allocator = baseURL ? CFGetAllocator((CFURLRef)baseURL) : 
> CFAllocatorGetDefault();
> 
>    /*
>     Some badly formed URLs come in with multiple # characters, but a URL can 
> only contain a single fragment.
>     Use the leftmost # to denote the fragment as CFURL does, and escape it 
> separately.
>     */
>    NSRange fragmentRange = [string rangeOfString:@"#"];
>    CFStringRef fragmentString = NULL;
>    if (fragmentRange.location != NSNotFound && 
> (NSUInteger)CFStringGetLength(urlString) > (fragmentRange.location + 1)) {
>        // skip the assumed fragment position
>        fragmentString = (CFStringRef)[string 
> substringFromIndex:(fragmentRange.location + 1)];
>        urlString = (CFStringRef)[string 
> substringToIndex:fragmentRange.location];
>        fragmentString = CFURLCreateStringByAddingPercentEscapes(allocator, 
> (CFStringRef)fragmentString, CFSTR(""), NULL, kCFStringEncodingUTF8);
>        // autorelease to deal with multiple exit points
>        fragmentString = (CFStringRef)[(id)fragmentString autorelease];
>    }
> 
>    /*
>     Normalize the URL string to deal with partially escaped URLs.  
> CFURLCreateStringByAddingPercentEscapes 
>     will replace % in existing percent escapes with a %25, which is the 
> percent character escape.
>     */
>    CFStringRef unescapedString = 
> CFURLCreateStringByReplacingPercentEscapes(allocator, urlString, CFSTR(""));
>    if(unescapedString == NULL) return nil;
> 
>    // we need to validate URL strings, as some DOI URL's contain characters 
> that need to be escaped
>    urlString = CFURLCreateStringByAddingPercentEscapes(allocator, 
> unescapedString, CFSTR(""), NULL, kCFStringEncodingUTF8);
>    CFRelease(unescapedString);
> 
>    if (fragmentString) {
>        CFMutableStringRef finalString = CFStringCreateMutableCopy(allocator, 
> CFStringGetLength(urlString), urlString);
>        CFStringAppend(finalString, CFSTR("#"));
>        CFStringAppend(finalString, (CFStringRef)fragmentString);
>        CFRelease(urlString);
>        urlString = finalString;
>    }
> 
>    CFURLRef theURL = CFURLCreateWithString(allocator, urlString, 
> (CFURLRef)baseURL);
>    CFRelease(urlString);
> 
>    return [(NSURL *)theURL autorelease];
> } 

This does not really address the true problem, which is that the URL can 
contain escaped allowed characters (like #, %, @, etc). This just does some 
horrible special things with #, but doesn't address others (like the escaped @ 
that's also in this particular URL). So I think that both Apple's sample and 
our code (basically equivalent) are wrong, because they unescape characters 
that should remain escaped. I think the most important rule should be that any 
code we use should never change a URL that's already completely valid (fully 
escaped), like this particular one. Apple's sample and our code don't satisfy 
this requirement. And remember that the whole point of this cleaning is to 
correct invalid input from the user. The cure should not be worse than the 
illness. That's why I saved it in the simplest way possible, and just escape 
characters without touching escapes by not escaping %. 

Christiaan


------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Bibdesk-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users

Reply via email to