I have no idea at on how to fix it "correctly", but would it not be an idea just to dump the tables with weird character into a text file, and search replace them.
The wrong chars are probably uniformly wrong, so you should be able to fix them using sed or some such tool (on linux, I am not well versed with windows) afterwards you clean up the db setting (droping it?) and the load the db again robert On 26.08.2016 13:58, Tomasz Tyrakowski [email protected] [firebird-support] wrote: > Hi, > > One of the legacy databases I happen to maintain has character sets and > collations messed up. As far as I know, the database was created with > default character set "NONE" instead of "WIN1250", which should have > been set back then (and it's over 10 years old). > Then, some columns have been added with the character set option set > explicitly to WIN1250 (alter table X add Y varchar(n) character set > WIN1250), while the old columns still remained as "NONE". > The clients were windows apps and used WIN1250 when connecting to the > database. > Later on, another maintainer changed the RDB$COLLATION_ID flags for some > of the columns in RDB$RELATION_FIELDS to indicate WIN1250 (value 3 as > far as I know, although I'm not entirely sure if this flag actually > concerns character sets, or only collation order - maybe the maintainer > was convinced he was clever, while in fact nothing relevant happened), > and set the default character set of the database to WIN1250 (in a way > unknown to me - maybe even via binary editing the GDB file and changing > some byte-flags inside). > So now the DB is kind of a mess (despite reporting dutifully WIN1250 as > the default character set), with some columns created when the DB had > "NONE" character set, some other create with character set WIN1250 set > explicitly upon creation, yet another created without explicit character > set but when the database has already had the "WIN1250" default > character set, and possibly some columns, for which the RDB$COLLATION_ID > was set by hand. > So my question is (in fact I'm not even sure what the right question is > :( ): assuming the strings in the database are in fact WIN1250-encoded > (all the clients used WIN1250), is setting the RDB$COLLATION_ID on all > text columns a good idea and would actually make any difference? > All strange quirks that happen from time to time with this database are > related to national characters. Sometimes a sort doesn't work as > expected, sometimes a client function (like strtoupper in PHP) doesn't > work on data from some of the columns (but works for other data). Is > there a chance that setting the collation flag might help? Or is there > another way to set it right? > I know the "right" right thing to do (pun intended) is to create a new > DB and pump the data, but it has about 400 interconnected tables and > over 60GB of size, so if there is another way, I would be grateful to > hear about it first. > > Thanks in advance and sorry for this lengthy story. > > regards > Tomasz >
