Mike Matrigali wrote:

physically I am not sure the best way to store it.

Are we sure the collation id can be represented as an INT?  I may have
missed it but do we expect a different number here for each different
language, or is there a single number that says sort based on language
and go look up language somewhere else?

Single values, the locale is fixed by the database:

0 - collation using code point order UCS_BASIC
1 - collation using the locale of the current database ("unicode")
2 - collation using LOWER() with the locale of the current database ("unicode") 3 - collation using UPPER() with the locale of the current database ("unicode")

These map to the way SQL does it, which is fixed names for collations.

Now I guess in theory there could be additional futures of:
collate according a specific locale (e.g. french) in a database of a different locale.
  collate according to a user defined class

My guess these could be handled with an integer and indirection. The DataValueFactory would assign values dynamically within a database, so it would use 100 for locale french and also store in service.properties the mapping between collation 100 and locale french. And of course in a different database 100 might mean collate using com.acme.myapp.MyCollator.

So I think single values will suffice.

options include:
1) most straight forward would be an array with an entry for each column whether it is character or not. If we use compressedInteger format we can get away with only 1 byte per "null" entry. Note on the way out it
is easy to tell if it is a character, but on the way back we only have
format id's. I was hoping to have a single call to datafactory(format id, collate id) and get back the correct object.

Will it ever make sense to assocate a collation with something other
than a character type?

Not that I can think of, and I think an int range provides for lots of expansion.

2) some sort of encoded sparse index with entries only for the character columns (anyone know if there is a java utility to do this)? The downside is that this usually means even more data stored than option 1
in some cases.

One option is if there are no character columns don't have the array.


3) some sort of format that on read would depend on first getting an
uncollated datatype of type format-id and then regetting it based on
some code.  So maybe some extra object creation and extra cpu overhead
to create the template in readExternal.

Not sure how this would work.

Dan.

Reply via email to