Well since I originally added r2927 with Jay Pipes' help I think that's the correct approach, so I'll add it back shortly and remove r2909 so it doesn't break anything.
On Mon, Dec 15, 2008 at 2:04 PM, Geoffrey Sneddon <[email protected] > wrote: > > Hi, > > Following the discussions last week about character sets and arthus' > install breaking, it's probably time to look back in time at all the > various states Habari MySQL databases might be in before we try and > write anything to fix it. > > Now, on to the history: > > The beginning: > > The Habari tables and the database connection followed whatever the > default of the database was. We (naïvely) assumed everything that we > received was UTF-8. This meant to function correctly either the > character set must be UTF-8 or a SBCS (single byte character set; > i.e., every character is represented by a single byte; e.g., all > ISO-8859 character sets) in which UTF-8 could be stored as binary data. > > r1377: > > This changed to interacting the database by calling `SET NAMES utf8;`. > This broke all blogs that weren't already using UTF-8, or using only > the intersection between the character set in the database and UTF-8. > > The database could then be in three states: > - UTF-8, > - Only characters used in the intersection between the database > character set and UTF-8 (normally ASCII only in an ASCII-superset such > as ISO-8859-1); > - Fresh installs are stored in whatever the default database character > set is (this could be something completely different like UCS-2 which > isn't even an ASCII-superset). > > Regardless of what the content is stored as in the database, it is now > passed to PHP from MySQL as UTF-8. > > r1530: > > This converted all installs to UTF-8 tables, and in the process broke > everything that didn't already use UTF-8, or used only the > intersection between the character set in the database and UTF-8. > > This brought us down to two states: > - UTF-8; > - Fresh installs are stored in whatever the default database character > set is (this could be something completely different like UCS-2 which > isn't even an ASCII-superset). > > r2909: > > This made new installs use UTF-8. This also tried to move all existing > installs to UTF-8, but failed (see arthus's breakage). This upgrade > script was the same as in r1530 (this was wrong as we're coming from a > different state). > > This resulted in everything being UTF-8, and breaking anything that > was installed between r1530–r2908 where the default database character > set was not UTF-8 (or didn't use only the intersection between the > database character set and UTF-8). > > r2927: > > This replaced the upgrade script added in r2909. This should be the > upgrade script we want. > > This brought us down to knowing the database is UTF-8. > > r2932: > > This reverted r2927. Both myself and Matthias thought the patch was > wrong as the linked IRC discussion shows. This brings us back to the > same undesirable state that r2909 left us in. > > This brings us to the present. > > > Now, to get us out of this hole, the upgrade script in r2927 should be > re-added and the r2909 one removed. Myself and Matt were wrong because > we did not realize that the r1530 upgrade script would avoid UTF-8 > stored in a SBCS ever reaching this upgrade script. If anyone thinks > this is wrong, please do say. > > > -- > Geoffrey Sneddon > <http://gsnedders.com/> > > > > > --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/habari-dev -~----------~----~----~----~------~----~------~--~---
