On Sat, Feb 7, 2009 at 9:25 AM, John Doe <[email protected]> wrote:
> > I am having difficulty while parsing some Turkish sites.Here is the > part of the code. The problem is when the title contains some non-UTF > characters like ç,ü,ı,ö,ğ it stops parsing and doesnt read the rest. > For example if the title is "Ebru Gündeş askere gitti" it only reads > until "ş" which is "Ebru G". Or when reading "Serdar Ortaç sünnet > oldu" it only read "Serdar Or" > How can I fix the problem? Any suggestions??? Well, what you probably mean is "non-ASCII". Those are all perfectly satisfactory Unicode characters. What you need to do is figure out what "encoding" your text is in. At the top of the XML there should be a thing that looks like <?xml version="1.0" encoding="XXX" ?> If the encoding of your file is UTF-8 (the default) you don't need the encoding= bit. So one thing would be to convert your file to UTF-8; there are a bunch of encoding conversion tools around. Another would be to figure out out the encoding, quite possibly it's in ISO-Latin-1, and put in "ISO-Latin-1" for "XXX" above. Another would be to convert the non-ASCII characters to what are called "numeric character references". For example, ç is U+00E7 in Unicode, so you can include it as ç in your XML and everything will be fine. The real lesson is that internationalization is hard. I've written a couple of things on this that are widely read and might be useful. http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF -Tim --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en -~----------~----~----~----~------~----~------~--~---

