Re: Another collation question - Derby-1478 and Derby-2377

Mamta Satoor Wed, 16 May 2007 14:12:01 -0700

Wow, Mike has done such a great job of covering the questions that I don't
have much to add. Just answer to one of Laura's question

Is there a complete listing of the territories that are supported...
maybe in a Java spec?

As Mike says, this feature (DERBY-1478) does not change the existing support
for territories in any ways. Derby 10.2 reference manual under "Setting
attributes for the database connection URL" has a sub-section called
"territory=ll_CC" and it talks about ll and CC and where the valid values
for them can be found.


Laura, thanks for working on the documentation for DERBY-1478. Let us know
if you have any further questions.

Mamta


On 5/16/07, Mike Matrigali <[EMAIL PROTECTED]> wrote:




Laura Stewart wrote:
> As part of adding the new attribute collation=TERRITORY_BASED, I think
> that we need to describe how Derby handles collation.
>
> I am trying to get my head around the best way to describe collation
> in Derby... for 10.3
>
> In general terms, a collating sequence is a defined ordering for
> character data that determines whether a particular character sorts
> higher, lower, or the same as another character.  Each character set
> will also have a default collation.
I would also not use character set.  I would approach documenting it
based on the behavior of datatypes rather than talk about character
sets.  So CHAR, VARCHAR, LONG VARCHAR and CLOB comparison/ordering/like
processing is affected.

>
> In Derby, it is my understanding that all of our string data types are
> represented as Unicode sequences.  Is that correct?
I believe the documentation should only speak to the datatypes rather
than the underlying storage structure.  To understand current
implementation all operations on character types use either String or
java char in memory to perform operations.  JDBC defines how one inputs
data into the datatypes and retrieves data from the datatypes.
>
> We should have a complete list of the data types that are impacted by
> collation.
> CHAR
> VARCHAR
> CLOB ?
I believe it is
CHAR
VARCHAR
LONG VARCHAR
CLOB
>
> Does Derby support the national character datatypes such as
> NCHAR/NVARCHAR2?
No.
>
> FYI - there is a feeling among some in the Internet community that the
> term "character set" is not appropriate.  They tout character code,
> character encoding, or character repertoire.
>
> Does Derby support specifying codes?  Is that what the attribute
> territory=l_CCI (example territory=es_MX) does?
>
> Is there a complete listing of the territories that are supported...
> maybe in a Java spec?
Hopefully mamta can expand here.  I hope that we can define our support
in terms of the standard interfaces we are using from java to perform
the ordering if a database has been defined to order based on it's
territory.

I don't believe 10.3 will change the territories supported, it is the
same set as 10.2 (basically we support what java supports).  10.3 just
allows collation to be based on territory, all other territory support
is unchanged.
>
> When you create a database, can you specify that the
> default character set for CHAR columns be ASCII, and the character set
> used for NCHAR be UTF8?
No there is no such thing.  We are not specifying a character set.  You
specify a teritory, this is existing functionality in 10.2.  In 10.3 you
specify at database creation time if you want collation of all user
character data to be determined by the territory or not.  In the current
implementation it does not change the storage format, but I don't think
that should be part of the documentation.

Do not get confused by what other databases may have to include in such
a change.  Derby has always used java String/char support which is
unicode based, so no difference is needed to operate on non-ascii
character data.  How Derby chooses to read/write those characters to
disk is even less important for user interface documentation and could
be changed in the future.  We happen to currently use a modified UTF8
scheme (modified to support very long strings), but that is never
exposed to a user.
>
> The Derby documentation mentions code sets, but only with relationship
> to import/export topics or ij sessions...
right.  The 10.3 functionality does not change any of this, it only
affects the ordering within the server.  Different operating systems,
environments may operate on different codesets outside of derby - but
once the data has gotten in (through an import, ij, jdbc) then data
is treated same on all systems.  On exit (export, ij, jdbc) the data
may then get transformed to a native codeset.  None of this is affected
by the 10.3 collation changes.
>
> Any insite that you can provide on this would be appreciated.
>

Re: Another collation question - Derby-1478 and Derby-2377

Reply via email to