Mon, 25 May 2009 17:18:19 +0100, richard wrote:
>>>> <http://dbpedia.org/resource/%C4%8C%C3%A1raj%C3%A1vri> ...
>>>> How can %C4%8C be decoded? Obviously it's not Unicode.
>>> That is URL encoding.
>>
>> I should have spent some more details here: If I url-decode the above,
>> I don't know what the result should be. UTF-8?
> 
> Yes. The byte sequence that you get after decoding the %-encoding is  
> to be turned into a character sequence by using UTF-8.

> echo resource/%C4%8C%C3%A1raj%C3%A1vri | urldecode          
resource/Äárajávri
> echo resource/Äárajávri | unihist
Invalid UTF-8 code encountered at line 0, character 9, byte 9.
The sequence is not a valid UTF-8 character because
the first byte, value 0xC4, bit pattern 11000100,
requires 1 continuation bytes, but of the immediately
following bytes, byte 1, value 0xC3, bit pattern
11000100 is not a valid continuation byte, since
its high bits are not 10.
> 

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to