I did work with a japanese site and we used Shift_JIS which is a UTF-8
extension. We would store Shift_JIS into the database but then we had
some issues reading the stored data from the database. The characters
were entered as Shift_JIS and stored as UCS-2 (UTF-16) in SQL Server.
We tried reading them straight from the database and displaying them
on screen without any byte encoding conversion. But, they wound up
looking all wrong. The browser did not handle the conversion properly.
We then read the data from the database and used the java
String.getBytes(String charSetName) method to reset the encoding.
However, the java String.getBytes method did not work properly. We
wound up writing our own conversion that was quite simple and
everything worked. So, as far as i know, all the glyph representations
that are available in UTF-8 are available to UTF-16 and it is possible
to convert back and forth between the two so long as a glyph does not
exceed UTF-8 glyph storage size. But, I think UTF-16 has the potential
to store more complex glyphs. Maybe i'm wrong. But, that is my
impression with all of this.

Brandon

On 4/20/05, Miquel Angel Bada Zuazo <[EMAIL PROTECTED]> wrote:
> UTF-8 is for almost all languajes (uses 8 bits for representing a
> letter I think), but "complicated" languajes as Japanese and Thailand
> uses 16 bits, so that's because of UTF-16 overall.
> 
> Miquel Angel
> 
> On 4/20/05, Brandon Goodin <[EMAIL PROTECTED]> wrote:
> > I've done quite a bit with i18n working between UTF-8 and UTF-16. Even
> > after all that... I'm still mystified. :D Encoding is a world unto
> > itself. All i want is something that works :) Maybe one of these days
> > i'll understand more... for now it's all about trial and error.
> >
> > On 4/20/05, Brice Ruth <[EMAIL PROTECTED]> wrote:
> > > I don't see anywhere in there that UTF-8 cannot encode everything that
> > > UTF-16 and UTF-32 can ... just that the storage requirements differ ?!
> > >
> > > Brice
> > >
> > > On 4/20/05, Brandon Goodin <[EMAIL PROTECTED]> wrote:
> > > > http://icu.sourceforge.net/docs/papers/forms_of_unicode/
> > > >
> > > > On 4/20/05, Brice Ruth <[EMAIL PROTECTED]> wrote:
> > > > > I had heard that chinese does a lot with UTF-16, but I hadn't heard
> > > > > about arabic ... and I don't exactly understand why UTF-8 doesn't
> > > > > support that ... is it simply because their character sets keep
> > > > > expanding and UTF-8 is static?
> > > > >
> > > > > On 4/20/05, Brandon Goodin <[EMAIL PROTECTED]> wrote:
> > > > > > Latin characters are fine. Howeve, UTF-8 is not sufficient for 
> > > > > > several
> > > > > > languages like Arabic and Chinese. For their FULL range of character
> > > > > > representaions these languages require UTF-16 and in the case of
> > > > > > Chinese it is pushing for UTF-32.
> > > > > >
> > > > > > Brandon
> > > > > >
> > > > > > On 4/20/05, Brice Ruth <[EMAIL PROTECTED]> wrote:
> > > > > > > OK ... that's more reasonable. Obviously, you need to use an 
> > > > > > > editor
> > > > > > > (such as Eclipse) that is capable of editing UTF-8 files, 
> > > > > > > otherwise,
> > > > > > > you'll get junk and that won't be fun.
> > > > > > >
> > > > > > > Whew ... glad UTF-8 isn't compromised :)
> > > > > > >
> > > > > > > On 4/20/05, Brandon Goodin <[EMAIL PROTECTED]> wrote:
> > > > > > > > I found this quote when doing a search in google:
> > > > > > > >
> > > > > > > > --- quote ---
> > > > > > > >
> > > > > > > > Your actual problem is very typical. By default (without 
> > > > > > > > encoding
> > > > > > > > specified in the XML declaration), XML is encoded in UTF-8. If 
> > > > > > > > you use
> > > > > > > > an editor which is not encoding-aware and typically assuming an
> > > > > > > > ISO-8859-1 encoding, and you insert characters such as accented
> > > > > > > > letters, curly quotes, etc., you will get this error. As a 
> > > > > > > > workaround,
> > > > > > > > you can put an XML declaration with the ISO-8859-1 encoding at 
> > > > > > > > the top
> > > > > > > > of your XML file:
> > > > > > > >
> > > > > > > > <?xml version="1.0" encoding="ISO-8859-1"?>
> > > > > > > >
> > > > > > > > You can also use an editor which knows how to handle UTF-8.
> > > > > > > >
> > > > > > > > In your case it is also possible that somebody inserted 
> > > > > > > > incorrect
> > > > > > > > characters by accident, and you can just remove those and then 
> > > > > > > > decide
> > > > > > > > which encoding you want to use. UTF-8 gives you the whole range 
> > > > > > > > of
> > > > > > > > Unicode, while ISO-8859-1 gives you a limited set of characters 
> > > > > > > > that
> > > > > > > > work for the Western languages.
> > > > > > > >
> > > > > > > > --- quote ---
> > > > > > > >
> > > > > > > > maybe that will help,
> > > > > > > > Brandon
> > > > > > > >
> > > > > > > > On 4/20/05, Brice Ruth <[EMAIL PROTECTED]> wrote:
> > > > > > > > > What special characters aren't supported by UTF-8?! I have 
> > > > > > > > > never heard
> > > > > > > > > of such a thing. My understanding is that UTF-8 represents 
> > > > > > > > > the full
> > > > > > > > > Unicode character set as a multi-byte value. And since 
> > > > > > > > > Unicode is
> > > > > > > > > supposed to encompass all known characters for all known 
> > > > > > > > > languages
> > > > > > > > > (with space for new Chinese characters created daily) - 
> > > > > > > > > what's not
> > > > > > > > > covered?!
> > > > > > > > >
> > > > > > > > > There most certainly shouldn't be anything that iso-8859-1 or 
> > > > > > > > > latin1
> > > > > > > > > (Windows-1252) covers that is not in Unicode.
> > > > > > > > >
> > > > > > > > > Brice
> > > > > > > > >
> > > > > > > > > On 4/20/05, Daniel H. F. e Silva <[EMAIL PROTECTED]> wrote:
> > > > > > > > > > You could check also your xml encoding. If you work with 
> > > > > > > > > > special charaters not in utf-8, you will
> > > > > > > > > > get in trouble.
> > > > > > > > > > I had this as my native language is portuguese and we have 
> > > > > > > > > > some special characters not supported
> > > > > > > > > > by utf-8.
> > > > > > > > > > So, if this is your case, try iso-8859-1 or one that fits 
> > > > > > > > > > better to your needs.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >  Daniel Silva.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --- Larry Meadors <[EMAIL PROTECTED]> wrote:
> > > > > > > > > > > Make sure that there is no white space and no odd chars 
> > > > > > > > > > > at the top of your
> > > > > > > > > > > config file.
> > > > > > > > > > >
> > > > > > > > > > > Larry
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 4/18/05, KK <[EMAIL PROTECTED]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I get the following error when I try to build 
> > > > > > > > > > > > sqlCOnfigmap..does it
> > > > > > > > > > > > look familiar to someone?
> > > > > > > > > > > >
> > > > > > > > > > > > com.ibatis.sqlmap.client.SqlMapException: There was an 
> > > > > > > > > > > > error while
> > > > > > > > > > > > building the SqlMap instance.
> > > > > > > > > > > > --- The error occurred in the SQL Map Configuration 
> > > > > > > > > > > > file.
> > > > > > > > > > > > --- Cause: com.ibatis.sqlmap.client.SqlMapException: 
> > > > > > > > > > > > XML Parser Error.
> > > > > > > > > > > > Cause: java.io.UTFDataFormatException: Invalid byte 3 
> > > > > > > > > > > > of 3-byte UTF-8
> > > > > > > > > > > > sequence.
> > > > > > > > > > > > Caused by: java.io.UTFDataFormatException: Invalid byte 
> > > > > > > > > > > > 3 of 3-byte
> > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > Caused by: com.ibatis.sqlmap.client.SqlMapException: 
> > > > > > > > > > > > XML Parser Error.
> > > > > > > > > > > > Cause: java.io.UTFDataFormatException: Invalid byte 3 
> > > > > > > > > > > > of 3-byte UTF-8
> > > > > > > > > > > > sequence.
> > > > > > > > > > > > Caused by: java.io.UTFDataFormatException: Invalid byte 
> > > > > > > > > > > > 3 of 3-byte
> > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > at 
> > > > > > > > > > > > com.ibatis.sqlmap.engine.builder.xml.XmlSqlMapClientBuilder.buildSqlMap
> > > > > > > > > > > > (XmlSqlMapClientBuilder.java:203)
> > > > > > > > > > > > at com.ibatis.sqlmap.client.
> > > > > > > > > > > > SqlMapClientBuilder.buildSqlMapClient(SqlMapClientBuilder.java:49)
> > > > > > > > > > > >
> > > > > > > > > > > > Your help is greatly appreciated.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > KK
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > __________________________________________________
> > > > > > > > > > Do You Yahoo!?
> > > > > > > > > > Tired of spam?  Yahoo! Mail has the best spam protection 
> > > > > > > > > > around
> > > > > > > > > > http://mail.yahoo.com
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Brice Ruth
> > > > > > > > > Software Engineer, Madison WI
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Brice Ruth
> > > > > > > Software Engineer, Madison WI
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Brice Ruth
> > > > > Software Engineer, Madison WI
> > > > >
> > > >
> > >
> > > --
> > > Brice Ruth
> > > Software Engineer, Madison WI
> > >
> >
>

Reply via email to