On 3/16/2016 8:49 AM, Lester Caine wrote:
On 16/03/16 11:39, James Starkey wrote:
Or simply restrict database file names to ASCII. It's not like users
have to deal with them, just like they don't have to deal with
identifiers in SQL, or C or Java.
As someone linguistically challenged, I have no problem with my own
code, but English is not the most used language on the planet, and for
many it creates another complication to programming in their own
language, so I can understand that 'restrict database file names to
ASCII' is as irritating these days as some of the other 'politically
correct' things we have to put up with with. That SQL and other
programming languages are essentially 'english' is not real case for
only supporting 'english' in the 21st century?
But many programming languages and os's still can't cope with this
problem anyway so perhaps we have to live with that? :(
Putting aside the point that most people on the planet don't have to
specify Firebird database connection strings, let's look at where we
are, how we got here, and what makes sense going forward.
In my youth when "computer" and "IBM machine" were synonyms, EBCDIC
ruled the world. If you wanted to put on a print train with
non-standard glyphs, that was fine, but you were on your own. With
minicomputers came ASCII, sometimes called 7-bit ASCII by the ignorant.
Unix used ASCII, but had next to no support for anything but ASCII (this
budged when X-10 rolled around). Mcrosoft was among the first companies
to have OS recognition of national character sets with Windows in 1990.
If you wonder why Microsoft is so non-standard, it's because they
pre-dated the standard. Linux grew up outside the US, which was good
for national character sets.
As an industry, we've learned about international character sets the
hard way -- by making lots of mistakes. National character sets, we
eventually figured out, were bad and a single coding to rule them all,
Unicode was good. But there were lots of mistakes still in the pipe.
Here's what we've learned: National character sets are horrible.
Unicode is good. Unicode-16, alas, didn't cut it. UTF-16 is incredibly
stupid, except as a recovery path from Unicode-16 (sorry, Java). Utf-8
is the only rational encoding.
There is where Firebird has to go:
1. The engine, and wire protocol should be UTF-8 only. There should be
no overhead checking for character sets and no bugs where character
sets weren't check.
2. Local character set conversion (required for keyboard and printers)
are client side only.
3. The engine takes a connection string and opens a database file with
the full knowledge that any number of file name strings may map to
the same file. It an engine needs a unique identifier for a
database file, it should use a UUID from the header page.
And, please, re-architecture connection strings to get away from file
names. Use some server side mapping from database to database name
string. This doesn't obviate rule 3 above, it just would make the
system easier to use. If it were me, I use the unicode mapping to
uppercase and be done with it.
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel