On Sep 12, 2008, at 3:56 PM, Kai wrote:

When NSXMLParser hits a character entity like ä (-> German umlaut 'ä'), it sends parser:resolveExternalEntityName:systemID: to its delegate and if this is not implemented or returns nil, parser:parseErrorOccurred: is called with NSXMLParserUndeclaredEntityError.

Am I supposed to resolve all these character entities myself? And if so, what should the NSData object returned by parser:resolveExternalEntityName:systemID: contain? Unicode? Which Unicode encoding?

But this can’t be, can it? I must be missing something simple.

Thanks for any hints
Kai


The main problem is that entities like ä are defined by HTML and have nothing to do with XML or NSXMLParser.

I haven't dealt with this problem myself but I was curious so I tried a few things.



My first attempt was using NSAttributedString to convert the HTML entity to a UTF8 string.

- (NSData *)parser:(NSXMLParser *)parser resolveExternalEntityName: (NSString *)entityName systemID:(NSString *)systemID
{       
NSAttributedString *entityString = [[[NSAttributedString alloc] initWithHTML:[[NSString stringWithFormat:@"&%@;", entityName] dataUsingEncoding:NSUTF8StringEncoding] documentAttributes:NULL] autorelease];
        
        NSLog(@"resolved entity name: %@", [entityString string]);
        
        return [[entityString string] dataUsingEncoding:NSUTF8StringEncoding];
}

This works, parser:foundCharacters: gets the ä but for some reason parser:parseErrorOccurred: is still being called with the same error you received: "Operation could not be completed. (NSXMLParserErrorDomain error 26.)"

The parser does continue and parse the file correctly (with the ä), it just makes it hard to tell when you have real errors. I'm really curious as to why this doesn't work (running 10.5.4 on Intel). And the fact that the parser keeps parsing after the error, when the documentation says it will stop, is odd too.



Another option is to add an XHTML DocType DTD to the file and set setShouldResolveExternalEntities: to YES (default is NO). This works with no errors because the DTD defines the entities.

However NSXMLParser will download the DTD (over the net) every time you parse a file. So you probably want to copy one of the DTD's (say http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd ) locally. Although I didn't try it, you could copy the entity definitions into your own DTD to make the file smaller and parsing it faster.



Of course if the content really is XHTML you should really be using an HTML parser and not an XML one.

--Nathan


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to