Hi,

when implementing web sockets, I encountered a problem with the QTextCodec 
class.
This is a code snippet:

QTextCodec *codec = QTextCodec::codecForName("UTF-8")
codec->toUnicode(someUtf8StringContainingNonCharacters, …);

When toUnicode is called with a string containing Unicode non-character codes, 
QTextCodec returns a conversion error.
This is expected behaviour from the QTextCodec class, as non-character code 
input is explicitly tested in the unit tests, and are supposed to fail.

But, non-character codes are valid in Unicode, and should be maintained as is; 
Unicode published a corrigendum clarifying the handling of non-characters: 
http://www.unicode.org/versions/corrigendum9.html.
Of course, non-character codes are meant for internal use only, and don't have 
a 'standard' meaning; they are application dependent.
Also, displaying a non-character code doesn't make sense (as they are not meant 
to be displayed).

Because I am using QTextCodec in my QWebSockets implementation, I encounter the 
same problem (tests from Autobahn specifically checking on the acceptance of 
non-character codes all fail). I really don't have a problem with that (see 
Rationale at http://kurtpattyn.github.io/QWebSockets/), as text is just text in 
my opinion. If one want to exchange special characters in a text, I recommend 
using a binary format for that.

But, it makes the QTextCodec non-Unicode compliant.

So, my question is now: should we consider this as a bug, and thus file a bug 
request in Jira, or can we live with it?
Note that solving this issue could have an effect an QString as well, as it 
needs to handle those non-characters. Maybe, a flag could be added to 
QTextCodec to indicate the handling of those characters?

Kurt
_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to