Re: [Firebird-devel] RFC: File names with non-ASCII non-ANSI letters

Jim Starkey Sat, 19 Mar 2016 23:08:22 -0700

On 3/16/2016 8:49 AM, Lester Caine wrote:

On 16/03/16 11:39, James Starkey wrote:

Or simply restrict database file names to ASCII.  It's not like users
have to deal with them, just like they don't have to deal with
identifiers in SQL, or C or Java.

As someone linguistically challenged, I have no problem with my own
code, but English is not the most used language on the planet, and for
many it creates another complication to programming in their own
language, so I can understand that 'restrict database file names to
ASCII' is as irritating these days as some of the other 'politically
correct' things we have to put up with with. That SQL and other
programming languages are essentially 'english' is not real case for
only supporting 'english' in the 21st century?


But many programming languages and os's still can't cope with this
problem anyway so perhaps we have to live with that? :(

Putting aside the point that most people on the planet don't have tospecify Firebird database connection strings, let's look at where weare, how we got here, and what makes sense going forward.

In my youth when "computer" and "IBM machine" were synonyms, EBCDICruled the world. If you wanted to put on a print train withnon-standard glyphs, that was fine, but you were on your own. Withminicomputers came ASCII, sometimes called 7-bit ASCII by the ignorant.Unix used ASCII, but had next to no support for anything but ASCII (thisbudged when X-10 rolled around). Mcrosoft was among the first companiesto have OS recognition of national character sets with Windows in 1990.If you wonder why Microsoft is so non-standard, it's because theypre-dated the standard. Linux grew up outside the US, which was goodfor national character sets.

As an industry, we've learned about international character sets thehard way -- by making lots of mistakes. National character sets, weeventually figured out, were bad and a single coding to rule them all,Unicode was good. But there were lots of mistakes still in the pipe.

Here's what we've learned: National character sets are horrible.Unicode is good. Unicode-16, alas, didn't cut it. UTF-16 is incrediblystupid, except as a recovery path from Unicode-16 (sorry, Java). Utf-8is the only rational encoding.


There is where Firebird has to go:

1. The engine, and wire protocol should be UTF-8 only.  There should be
   no overhead checking for character sets and no bugs where character
   sets weren't check.
2. Local character set conversion (required for keyboard and printers)
   are client side only.
3. The engine takes a connection string and opens a database file with
   the full knowledge that any number of file name strings may map to
   the same file.  It an engine needs a unique identifier for a
   database file, it should use a UUID from the header page.

And, please, re-architecture connection strings to get away from filenames. Use some server side mapping from database to database namestring. This doesn't obviate rule 3 above, it just would make thesystem easier to use. If it were me, I use the unicode mapping touppercase and be done with it.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140

Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] RFC: File names with non-ASCII non-ANSI letters

Reply via email to