On Wed, Jun 17, 2009 at 9:46 AM, Ville M. Vainio <[email protected]> wrote:

>
> Our current clipboard approach has a fundamental problem: we are
> treating clipboard as text (unicode), but the xml data (outline
> structure) is not text. It's 8 bit binary data (that happens to be in
> utf8).


This may be a bug, but I wouldn't get excited and call it a fundamental
problem.

Leo assumes that all data it deals with internally is unicode.  This, and
nothing else, is fundamental.  To make this work, Leo converts from
so-called encoded formats to unicode when reading data, and converts back to
encoded formats when writing data.

g.toUnicode is the basis of this approach.  There may be, as you suggest,
instances where g.toUnicode isn't the easiest, or even correct, way to
convert data to unicode.  If so, we can deal with it, and use comments to
explain why g.toUnicode won't work.

I'm sure you understand this, but I want to say it anyway to emphasize what
is, and isn't, fundamentally important.

As you your recent change to g.app.gui.toUnicode::

    g.trace('Warning - toUnicode does encoding (bugs possible)')
return unicode(s,encoding,errors='replace')

This is, in essence, exactly what g.toUnicode does.  Yes, errors are
possible if the encoding specified does not match the encoding used to
created the encoded text.  g.toUnicode will convert "bad" characters to '?'
characters.

By the way, neither unicode nor g.toUnicode *does* encoding (that's the
reverse process).  Instead, these functions *assume* an encoding.  If there
is, in fact, a "fundamental" problem, it is that it is not possible, in
general, given an encoded string, to deduce the encoding.  But we aren't
going to change that.

Please note that g.app.gui.toUnicode is called only from the qt gui plugin.
Whatever encoding problems arise apply only to the gui plugin.  In essence,
the "general/fundamental" problem is simply stated:  how do we know what
encoding to pass to g.app.gui.toUnicode?

The tk gui handles this by using various settings to specify encodings.  All
will be well if those settings match the user's installation.

In short, there is a general/fundamental problem with calling unicode, or
equivalently, g.toUnicode, namely to avoid data loss (converting characters
to '?' characters, for instance) the encoding specified must match the
actual encoding of the encoded data.  There is no way around this problem,
and it has nothing specifically to do with Leo.

In *some* cases, we may be able to know for sure what the encoding is.
That's always good :-)

I suspect, but do not know for sure, that many of these problems will
disappear if people will specify utf-8 for Python's default encoding in
sitecustomize.py.  Certainly, I have no problems with non-ascii characters
on my machines.  If that isn't convenient for people, we may have to add a
setting that will tell the qt gui which encoding to assume by default.  We
will then pass that as a the encoding param to g.toUnicode, for example.

Separate method should be used to put binary data to clipboard (in tk,
> it's probably always binary data?).
>
> We should use setMimeData of QClipboard to set binary data:
>
> http://doc.trolltech.com/4.5/qclipboard.html#setMimeData


When do we ever want to put binary data to the clipboard?  Are you calling
utf-8 encoded strings binary data?

Edward

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to