Mon, 25 May 2009 17:18:19 +0100, richard wrote: >>>> <http://dbpedia.org/resource/%C4%8C%C3%A1raj%C3%A1vri> ... >>>> How can %C4%8C be decoded? Obviously it's not Unicode. >>> That is URL encoding. >> >> I should have spent some more details here: If I url-decode the above, >> I don't know what the result should be. UTF-8? > > Yes. The byte sequence that you get after decoding the %-encoding is > to be turned into a character sequence by using UTF-8.
> echo resource/%C4%8C%C3%A1raj%C3%A1vri | urldecode resource/Äárajávri > echo resource/Äárajávri | unihist Invalid UTF-8 code encountered at line 0, character 9, byte 9. The sequence is not a valid UTF-8 character because the first byte, value 0xC4, bit pattern 11000100, requires 1 continuation bytes, but of the immediately following bytes, byte 1, value 0xC3, bit pattern 11000100 is not a valid continuation byte, since its high bits are not 10. > ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
