Behrooz Shabani wrote:
i should mention again our problem is about saving notice defectively
in database (content field of notice table)! & it's because every
character in our language gets 2bytes in unicode! so if i write a
notice with 140 characters length it will be count as 280 characters
because we didn't tell to mysql that we have a utf8 string!
So, the problem, as I think you're phrasing it, is that notices with
>140 bytes are getting truncated at 140 bytes, and we're losing the
rest of the notice? But because the "rendered" field is stored as text
(no limit), we see the full notice in HTML? And if we use "SET NAMES
'UTF8'", this won't be a problem in the future?
That makes some sense. However, it's going to be a serious problem,
since we've stored millions of notices in... I don't know what
charset... in Identi.ca. I'm not sure how many notices are affected, nor
how we're going to retrieve the data (I guess un-HTMLing the "rendered"
field and storing it back in "content" might do it.)
I'm concerned old notices will look like gobbledygook if we make this
change. See this screenshot for what happens when I change this in the
terminal:
http://evan.prodromou.name/images/set-names-utf8.png
By the way, selecting length(content) on that notice shows 327, and
char_length(content) is 140, which seems like it knows that the data is
utf8.
Finally: I don't understand why some notices don't show up at all in the
output. What's going on with that?
-Evan
_______________________________________________
Laconica-dev mailing list
[email protected]
http://mail.laconi.ca/mailman/listinfo/laconica-dev