On Wed, Jul 5, 2017 at 3:54 PM, David Karr <[email protected]> wrote: > I've inherited a small webapp that is using MariaDB for persistence. > Some of the forms have textarea fields for extended text to be > entered. > > Someone reported an issue saving a form with some text that they had > pasted from an email. The message started with this: > ------------------ > Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: > Incorrect string value: '\xC2\x95\x09Onb...' for column 'ssimpact' at > row 1 > ---------------- > > I found where "Onb" is in the text, and right before it is a "bullet" > character. So, this appeared to be a Unicode conversion issue. I > tried pasting the same text after it had been passed to me, and it > didn't fail. I'm pretty sure it didn't fail because that process of > "passing it around" filtered the text to be all valid characters. The > person who reported the problem said that when she just resubmitted > it, it didn't fail. That might also point to a "cleansing" process > that resulted in the submitted characters being legal. > > What are some reasonable strategies for getting this to work a little better?
Self-replying to add some more information. I see from the output of "SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;" that for my database, DEFAULT_CHARACTER_SET_NAME is "latin1" and DEFAULT_COLLATION_NAME is "latin1_swedish_ci". When I created the database, I just did "create database <name>;". I'm guessing that when I created this database, I should have added "CHARACTER SET = 'utf-8'". Now that my database is created, and I have data in it, if I do an "alter table" on the tables that can have this data, will this do a proper conversion to the existing data, and allow the insertion of those "special" characters like bullets? >From https://mariadb.com/kb/en/mariadb/setting-character-sets-and-collations/ , I would guess I would do something like this: ----------------- ALTER TABLE table_name CONVERT TO CHARACTER SET 'utf-8' COLLATE 'utf8_general_ci'; ----------- I'm not certain about that collation name, but I noticed that the "information_schema" database has the utf-8 charset, and the "utf8_general_ci" collation name. _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp

