> After some hot discussion in Firebird-Java, I want to propose to ditch 
> the NONE as default character set for newly created databases when no 
> database charset is specified.

That hot and overly long discussion mentioned started with http://
tech.groups.yahoo.com/group/Firebird-Java/message/10616

I want to mark my point of view on that.

1. CharSet specification is critical for any text data.

1.1 In current FB documentation is is usually placed into some far-away fine-
prints, like Appendix D in Jaybird PDF, that means that any just-users, who 
does not have wish to master all the knobs nor has prior vast experience - 
would just ignore those settings. No one would read every Appendix just to 
create database.

1.2 That would lead non-experienced (say, those accustomed to other DB engines 
but ready to provide FB compatibiltiy if not too costly) developers creating 
the database in some random, client-defined charset. Then they would provide 
HowTos oblivion to charsets to their clients, even less literate in Firebird. 
That might lead to different environments of clients/devels and induced 
inability to repeat bugs.

1.2.Example: 2 weeks ago i was told that Firebird only supports KOI8-R for 
Russian. I was kind of speechless. I still cannot understand where that cam 
from, since those developers use Linux/UTF-8 boxes. Maybe some of their clients 
used Linux/KOI8-R or BSD/KOI8-R, dunno. They also made such a HowTo - probably 
taking some ready PDF and copy-pasting it: http://www.trackstudio.com/
connecting-firebird.html
That i think is to be expected as the level of knowledge by people not deep 
into IB/FB context historically. If FB wants to reach out of its current niches 
- that is what to expect. Along with suggestions to use any other database, etc.

1.3 Unspecified charset and specified as NONE are different situations, 
different like zero and NULL. They might be considered differently. They might 
in the end get treated the same, but reasoning is to be independent.

1.3.1 Explicit NONE is to be treated as specified already in http://
www.firebirdsql.org/refdocs/langrefupd21-notes-charset-none.html
in both the core and all clients.
If some client can not implement it (Jaybird teams claims it is impossible to 
inject raw byte stream into Java String), they would better refuse reading/
writing than do some heuristics to convert data. 
Whether NONE is explicitly stated user's will, it is not to be silently 
overrode.

1.3.2 non-specified charset might be treated with heuristics, however the 
heuristics is to be consistent all through the core and all clients.
However such a heuristics is to be documented explicitly and in easy to find 
place. Developer making a connection should not just "bypass" or "not reach' 
that documentation. He is to be aware about the question. Imagine he designs a 
database on his Windows devel machine, then he (or his user) moves it to Unix 
server and - it suddenly breaks down. Not good. Such a heuristics should be 
aimed at consistency of/with database, not at speed, size, etc.

1.3.3 changes in handling of NONE are not mandatory propagated to non-specified 
charset and vice versa. Just the fact of another variant switched behavior is 
not enough per se for changing different use case.

1.3.4 In particular, if client library chooses to do some transcoding of text, 
that means it de facto assumed some connection charset and should report it to 
server (and to the application). It would not have a good end, if client 
library reports one connection (for example NONE) but uses another for example 
trancoding Win1251<->UTF-16) behind the curtains. It basically disables all 
server's built-in safety checks. It might cause version-dependent incompatible 
behavior.  It also may end in double conversion in chains like FB -> FBClient/
FBEmbed -> Jaybird -> Java app.
No matter how many abstraction/API levels are taking place, the expectations 
about encodings should be explicit (better in API but in detailed docs at 
least) and strictly followed all the way through, at every boundary.

2. Databases are about reliability. Better slow than halted, yet better halted 
than inconsistent and messing.

2.1 Enforcing strict rules are not bad. If charset-related behaviour would be 
documented, then specifying it would not be a prohibiting burden for admins/
devels.

2.1.1 Discouraging or even disabling fragile behaviour is not bad. If user 
managed to run NONE db with NONE connection, or run WIN1251 db storing KOI8-R-
encoded texts, that is not the argument to continue like that, but rather an 
argument to halt and require mending before there would be even more data to 
loose. 100% backward-compatibility in this matter is turning into bug-
compatibility.

2.2 FBClient/FBEmbedded/JayBird/.NetProvider/ODBC-driver/KInterbasDB(whether 
revived) are all released under FB umbrella. Users can expect them to behave 
identically to FB Core.

2.2.1 FB project is to have a policy how clients are to deal with character 
sets. That is to be required consistent behavior of all the official clients.
User, gained experience with of one clients, is to be able to give advices to 
user of another lib or to set it up. If some user googled documentation or 
suggestions or HowTo's to some client or to core, he would expect them to be 
applicable to all official FB projects. And why not ?


While technically server and clients are different projects by different 
people, they look for outsiders as one body. And are expected to behave the 
same.

Thanks to everyone who managed to read all that scroll :-)


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to