[habari-dev] Re: A Brief History of Habari Databases and Their Character Sets

Chris Meller Mon, 15 Dec 2008 11:23:31 -0800

Well since I originally added r2927 with Jay Pipes' help I think that's the
correct approach, so I'll add it back shortly and remove r2909 so it doesn't
break anything.


On Mon, Dec 15, 2008 at 2:04 PM, Geoffrey Sneddon <[email protected]
> wrote:

>
> Hi,
>
> Following the discussions last week about character sets and arthus'
> install breaking, it's probably time to look back in time at all the
> various states Habari MySQL databases might be in before we try and
> write anything to fix it.
>
> Now, on to the history:
>
> The beginning:
>
> The Habari tables and the database connection followed whatever the
> default of the database was. We (naïvely) assumed everything that we
> received was UTF-8. This meant to function correctly either the
> character set must be UTF-8 or a SBCS (single byte character set;
> i.e., every character is represented by a single byte; e.g., all
> ISO-8859 character sets) in which UTF-8 could be stored as binary data.
>
> r1377:
>
> This changed to interacting the database by calling `SET NAMES utf8;`.
> This broke all blogs that weren't already using UTF-8, or using only
> the intersection between the character set in the database and UTF-8.
>
> The database could then be in three states:
> - UTF-8,
> - Only characters used in the intersection between the database
> character set and UTF-8 (normally ASCII only in an ASCII-superset such
> as ISO-8859-1);
> - Fresh installs are stored in whatever the default database character
> set is (this could be something completely different like UCS-2 which
> isn't even an ASCII-superset).
>
> Regardless of what the content is stored as in the database, it is now
> passed to PHP from MySQL as UTF-8.
>
> r1530:
>
> This converted all installs to UTF-8 tables, and in the process broke
> everything that didn't already use UTF-8, or used only the
> intersection between the character set in the database and UTF-8.
>
> This brought us down to two states:
> - UTF-8;
> - Fresh installs are stored in whatever the default database character
> set is (this could be something completely different like UCS-2 which
> isn't even an ASCII-superset).
>
> r2909:
>
> This made new installs use UTF-8. This also tried to move all existing
> installs to UTF-8, but failed (see arthus's breakage). This upgrade
> script was the same as in r1530 (this was wrong as we're coming from a
> different state).
>
> This resulted in everything being UTF-8, and breaking anything that
> was installed between r1530–r2908 where the default database character
> set was not UTF-8 (or didn't use only the intersection between the
> database character set and UTF-8).
>
> r2927:
>
> This replaced the upgrade script added in r2909. This should be the
> upgrade script we want.
>
> This brought us down to knowing the database is UTF-8.
>
> r2932:
>
> This reverted r2927. Both myself and Matthias thought the patch was
> wrong as the linked IRC discussion shows. This brings us back to the
> same undesirable state that r2909 left us in.
>
> This brings us to the present.
>
>
> Now, to get us out of this hole, the upgrade script in r2927 should be
> re-added and the r2909 one removed. Myself and Matt were wrong because
> we did not realize that the r1530 upgrade script would avoid UTF-8
> stored in a SBCS ever reaching this upgrade script. If anyone thinks
> this is wrong, please do say.
>
>
> --
> Geoffrey Sneddon
> <http://gsnedders.com/>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/habari-dev
-~----------~----~----~----~------~----~------~--~---

[habari-dev] Re: A Brief History of Habari Databases and Their Character Sets

Reply via email to