I agree Kathey. The bottom line is that if we don't impose this 63 character limitation, then the limit will be variable. For instance, if you use **just** special Latin characters (i.e. áéçóí), the limit will be 127 which is essentially what happens right now albeit in a much less elegant way. EBCDIC according to Knut's experiment is able to encode these special characters but it does seem like it takes more than one byte.
I tried to create a database with 243 special Latin characters (255 - 12 for ;create=true) on a 10.5.3.0 server and it just threw a very nasty array bounds exception (check my other e-mail on the list). Knut and Dag also suggested that we raise this limitation up to 0xFFFF (65535) characters as allowed by the two bytes with which we encode length. Would you agree with this approach? Just to sum: even if we don't raise the limitation, it doesn't seem like my changes will be breaking access to currently existing databases as there is indeed a limit currently. The only issue is that if we are using strictly Chinese characters, we will indeed be capped at 85 characters (85 * 3 bytes = 255 bytes). Since we didn't allow Chinese characters on the client driver before this might not be bad from a regression perspective but for long paths, this might be an issue (as it is even with other characters). Tiago ________________________________ From: Kathey Marsden <[email protected]> To: [email protected] Cc: Tiago Espinha <[email protected]> Sent: Mon, 13 September, 2010 16:33:09 Subject: Re: Database name length On 9/12/2010 9:22 AM, Tiago Espinha wrote: Is this an okay behavior? Or would it be preferable to impose a more strict limit where we assume that all characters take 4 bytes (worst case scenario in UTF-8) and **always** cap the dbname length at 63 characters (255 bytes / 4 bytes)? This would mean more work for my implementation and possibly an exclusion from 10.7. On the other hand, if we have this variable-length limit depending on the type of characters used, we should probably have some sort of release note alerting people about this fact. Hi Tiago, I don't think we should introduce any new limiting factors on embedded as it may break existing applications. I am curious as to the existing limits you found with embedded on Windows. Does that include the path leading up the database name and the attributes or just the final database name? For network server we have this existing documentation which needs modification with the introduction of UNICODEMGR. http://db.apache.org/derby/docs/dev/adminguide/cadminappsclient.html which says: For both driver and DataSource access, the database name (including path), user, password and other attribute values must consist of single-byte characters that can be converted to EBCDIC. The total byte length of the database name plus attributes when converted to EBCDIC must not exceed 255 bytes. You may be able to work around this restriction for long paths or paths that include multibyte characters by setting the derby.system.home system property when starting Network Server and accessing the database with a relative path that is shorter and does not include multibyte characters. This should be modified to remove the single byte character restriction and change EBCDIC to UTF-8. Thanks Kathey
