Hi Daniel, sorry for the late reply. I think your question is quite important as most issues with RDF seem to be encoding related.
On 12. Sep. 2011, at 18:48, Gerber Daniel wrote: > But still, why are the URIs not encoded this way? short answer: because that would have been too easy and it's not how it developed over time :) longer answer: RDF came along quite some time after URIs and initially was tied to its XML serialization. Now, xml already had it's way to escape non ascii values: the \uxxxx or \UXXXXXXXX . This kind of escaping for literals also made it into several other format such as ntriples, n3 and turtle. Now to the "URIs"... >> <http://dbpedia.org/resource/Emperor_Ninkō> >> <http://www.w3.org/2000/01/rdf-schema#label> "Nink\u014D"@de . Your terminology here is not very precise, probably because we all tend to be lazy and just call everything URI which looks like one :). To explain this precise terms help: The <http://dbpedia.org/resource/Emperor_Ninkō> is an "IRI Reference" (it even is an IRI as it's not relative). They were formally called "RDF URI reference" (not to be confused with "URI reference" from the URI RFC) in anticipation of the IRI RFC: http://www.w3.org/TR/rdf-concepts/#dfn-URI-reference > A URI reference within an RDF graph (an RDF URI reference) is a Unicode > string [UNICODE] that: > > • does not contain any control characters ( #x00 - #x1F, #x7F-#x9F) > • and would produce a valid URI character sequence (per RFC2396 [URI], > sections 2.1) representing an absolute URI with optional fragment identifier > when subjected to the encoding described below. > The encoding consists of: > > • encoding the Unicode string as UTF-8 [RFC-2279], giving a sequence of > octet values. > • %-escaping octets that do not correspond to permitted US-ASCII > characters. > [...] > Note: this section anticipates an RFC on Internationalized Resource > Identifiers. Implementations may issue warnings concerning the use of RDF URI > References that do not conform with [IRI draft] or its successors. Now what does this mean? The <http://dbpedia.org/resource/Emperor_Ninkō> is a Unicode string (!= UTF-8 String), which can be turned into a valid "URI character sequence" by following the steps described above. In order to dereference such an IRI we need to transform it into its URI equivalent and then use HTTP. In other words: From the IRI rfc sec. 1.2.a: http://tools.ietf.org/html/rfc3987 > "On the other hand, in the HTTP protocol [RFC2616], the Request URI is > defined as a URI, which means that direct use of IRIs is not allowed in HTTP > requests." This means that while it is allowed to identify things in RDF with IRIs it isn't possible to look them up without prior encoding as %-escaped UTF-8 string, which then is a (ASCII) URI. Now, you might remember that you can just copy the <http://dbpedia.org/resource/Emperor_Ninkō> into your browser and get results. Correct, but that's because most browsers do the IRI -> URI magic under the hood so you don't see that they actually request http://dbpedia.org/page/Emperor_Nink%C5%8D . (In Firefox hit CTRL + i (win) or CMD + i (mac)). Cheers, Jörn ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
