Wow, Mike has done such a great job of covering the questions that I don't have much to add. Just answer to one of Laura's question
Is there a complete listing of the territories that are supported... maybe in a Java spec?
As Mike says, this feature (DERBY-1478) does not change the existing support for territories in any ways. Derby 10.2 reference manual under "Setting attributes for the database connection URL" has a sub-section called "territory=ll_CC" and it talks about ll and CC and where the valid values for them can be found.
Laura, thanks for working on the documentation for DERBY-1478. Let us know if you have any further questions. Mamta On 5/16/07, Mike Matrigali <[EMAIL PROTECTED]> wrote:
Laura Stewart wrote: > As part of adding the new attribute collation=TERRITORY_BASED, I think > that we need to describe how Derby handles collation. > > I am trying to get my head around the best way to describe collation > in Derby... for 10.3 > > In general terms, a collating sequence is a defined ordering for > character data that determines whether a particular character sorts > higher, lower, or the same as another character. Each character set > will also have a default collation. I would also not use character set. I would approach documenting it based on the behavior of datatypes rather than talk about character sets. So CHAR, VARCHAR, LONG VARCHAR and CLOB comparison/ordering/like processing is affected. > > In Derby, it is my understanding that all of our string data types are > represented as Unicode sequences. Is that correct? I believe the documentation should only speak to the datatypes rather than the underlying storage structure. To understand current implementation all operations on character types use either String or java char in memory to perform operations. JDBC defines how one inputs data into the datatypes and retrieves data from the datatypes. > > We should have a complete list of the data types that are impacted by > collation. > CHAR > VARCHAR > CLOB ? I believe it is CHAR VARCHAR LONG VARCHAR CLOB > > Does Derby support the national character datatypes such as > NCHAR/NVARCHAR2? No. > > FYI - there is a feeling among some in the Internet community that the > term "character set" is not appropriate. They tout character code, > character encoding, or character repertoire. > > Does Derby support specifying codes? Is that what the attribute > territory=l_CCI (example territory=es_MX) does? > > Is there a complete listing of the territories that are supported... > maybe in a Java spec? Hopefully mamta can expand here. I hope that we can define our support in terms of the standard interfaces we are using from java to perform the ordering if a database has been defined to order based on it's territory. I don't believe 10.3 will change the territories supported, it is the same set as 10.2 (basically we support what java supports). 10.3 just allows collation to be based on territory, all other territory support is unchanged. > > When you create a database, can you specify that the > default character set for CHAR columns be ASCII, and the character set > used for NCHAR be UTF8? No there is no such thing. We are not specifying a character set. You specify a teritory, this is existing functionality in 10.2. In 10.3 you specify at database creation time if you want collation of all user character data to be determined by the territory or not. In the current implementation it does not change the storage format, but I don't think that should be part of the documentation. Do not get confused by what other databases may have to include in such a change. Derby has always used java String/char support which is unicode based, so no difference is needed to operate on non-ascii character data. How Derby chooses to read/write those characters to disk is even less important for user interface documentation and could be changed in the future. We happen to currently use a modified UTF8 scheme (modified to support very long strings), but that is never exposed to a user. > > The Derby documentation mentions code sets, but only with relationship > to import/export topics or ij sessions... right. The 10.3 functionality does not change any of this, it only affects the ordering within the server. Different operating systems, environments may operate on different codesets outside of derby - but once the data has gotten in (through an import, ij, jdbc) then data is treated same on all systems. On exit (export, ij, jdbc) the data may then get transformed to a native codeset. None of this is affected by the 10.3 collation changes. > > Any insite that you can provide on this would be appreciated. >
