Hi,
when implementing web sockets, I encountered a problem with the QTextCodec
class.
This is a code snippet:
QTextCodec *codec = QTextCodec::codecForName("UTF-8")
codec->toUnicode(someUtf8StringContainingNonCharacters, …);
When toUnicode is called with a string containing Unicode non-character codes,
QTextCodec returns a conversion error.
This is expected behaviour from the QTextCodec class, as non-character code
input is explicitly tested in the unit tests, and are supposed to fail.
But, non-character codes are valid in Unicode, and should be maintained as is;
Unicode published a corrigendum clarifying the handling of non-characters:
http://www.unicode.org/versions/corrigendum9.html.
Of course, non-character codes are meant for internal use only, and don't have
a 'standard' meaning; they are application dependent.
Also, displaying a non-character code doesn't make sense (as they are not meant
to be displayed).
Because I am using QTextCodec in my QWebSockets implementation, I encounter the
same problem (tests from Autobahn specifically checking on the acceptance of
non-character codes all fail). I really don't have a problem with that (see
Rationale at http://kurtpattyn.github.io/QWebSockets/), as text is just text in
my opinion. If one want to exchange special characters in a text, I recommend
using a binary format for that.
But, it makes the QTextCodec non-Unicode compliant.
So, my question is now: should we consider this as a bug, and thus file a bug
request in Jira, or can we live with it?
Note that solving this issue could have an effect an QString as well, as it
needs to handle those non-characters. Maybe, a flag could be added to
QTextCodec to indicate the handling of those characters?
Kurt
_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development