> After some hot discussion in Firebird-Java, I want to propose to ditch > the NONE as default character set for newly created databases when no > database charset is specified.
That hot and overly long discussion mentioned started with http:// tech.groups.yahoo.com/group/Firebird-Java/message/10616 I want to mark my point of view on that. 1. CharSet specification is critical for any text data. 1.1 In current FB documentation is is usually placed into some far-away fine- prints, like Appendix D in Jaybird PDF, that means that any just-users, who does not have wish to master all the knobs nor has prior vast experience - would just ignore those settings. No one would read every Appendix just to create database. 1.2 That would lead non-experienced (say, those accustomed to other DB engines but ready to provide FB compatibiltiy if not too costly) developers creating the database in some random, client-defined charset. Then they would provide HowTos oblivion to charsets to their clients, even less literate in Firebird. That might lead to different environments of clients/devels and induced inability to repeat bugs. 1.2.Example: 2 weeks ago i was told that Firebird only supports KOI8-R for Russian. I was kind of speechless. I still cannot understand where that cam from, since those developers use Linux/UTF-8 boxes. Maybe some of their clients used Linux/KOI8-R or BSD/KOI8-R, dunno. They also made such a HowTo - probably taking some ready PDF and copy-pasting it: http://www.trackstudio.com/ connecting-firebird.html That i think is to be expected as the level of knowledge by people not deep into IB/FB context historically. If FB wants to reach out of its current niches - that is what to expect. Along with suggestions to use any other database, etc. 1.3 Unspecified charset and specified as NONE are different situations, different like zero and NULL. They might be considered differently. They might in the end get treated the same, but reasoning is to be independent. 1.3.1 Explicit NONE is to be treated as specified already in http:// www.firebirdsql.org/refdocs/langrefupd21-notes-charset-none.html in both the core and all clients. If some client can not implement it (Jaybird teams claims it is impossible to inject raw byte stream into Java String), they would better refuse reading/ writing than do some heuristics to convert data. Whether NONE is explicitly stated user's will, it is not to be silently overrode. 1.3.2 non-specified charset might be treated with heuristics, however the heuristics is to be consistent all through the core and all clients. However such a heuristics is to be documented explicitly and in easy to find place. Developer making a connection should not just "bypass" or "not reach' that documentation. He is to be aware about the question. Imagine he designs a database on his Windows devel machine, then he (or his user) moves it to Unix server and - it suddenly breaks down. Not good. Such a heuristics should be aimed at consistency of/with database, not at speed, size, etc. 1.3.3 changes in handling of NONE are not mandatory propagated to non-specified charset and vice versa. Just the fact of another variant switched behavior is not enough per se for changing different use case. 1.3.4 In particular, if client library chooses to do some transcoding of text, that means it de facto assumed some connection charset and should report it to server (and to the application). It would not have a good end, if client library reports one connection (for example NONE) but uses another for example trancoding Win1251<->UTF-16) behind the curtains. It basically disables all server's built-in safety checks. It might cause version-dependent incompatible behavior. It also may end in double conversion in chains like FB -> FBClient/ FBEmbed -> Jaybird -> Java app. No matter how many abstraction/API levels are taking place, the expectations about encodings should be explicit (better in API but in detailed docs at least) and strictly followed all the way through, at every boundary. 2. Databases are about reliability. Better slow than halted, yet better halted than inconsistent and messing. 2.1 Enforcing strict rules are not bad. If charset-related behaviour would be documented, then specifying it would not be a prohibiting burden for admins/ devels. 2.1.1 Discouraging or even disabling fragile behaviour is not bad. If user managed to run NONE db with NONE connection, or run WIN1251 db storing KOI8-R- encoded texts, that is not the argument to continue like that, but rather an argument to halt and require mending before there would be even more data to loose. 100% backward-compatibility in this matter is turning into bug- compatibility. 2.2 FBClient/FBEmbedded/JayBird/.NetProvider/ODBC-driver/KInterbasDB(whether revived) are all released under FB umbrella. Users can expect them to behave identically to FB Core. 2.2.1 FB project is to have a policy how clients are to deal with character sets. That is to be required consistent behavior of all the official clients. User, gained experience with of one clients, is to be able to give advices to user of another lib or to set it up. If some user googled documentation or suggestions or HowTo's to some client or to core, he would expect them to be applicable to all official FB projects. And why not ? While technically server and clients are different projects by different people, they look for outsiders as one body. And are expected to behave the same. Thanks to everyone who managed to read all that scroll :-) ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
