Jan Willem Stumpel wrote:
Again, I apologise if this has been asked on this list before.
If I make a web page of which the text includes very long "words" (such
"words" in my case often are file pathnames), often very awkward
line-breaks result. They look even worse if the text is justified
("text-align: justify;" in .css).
It would be nice to allow such long "words" to be split at line breaks.
UTF-8 provides a method for this: the zero-width space (U+200B, or as an
HTML entity: "​").
OK, so I changed in my page all "/" characters occurring in pathnames to
"/​". This indeed greatly improves the appearance of the HTML
page. Instead of
This is a very long path name:
/etc/this/is/one/very/long/path/name
I get
This is a very long path name: /etc/this/is/one/very/long/
path/name
The overall visual impression is now much better.
(This is only an illustration, trying to approximate the effect on
justified text, not involving actual zero width spaces; I hope it
survives the e-mail transmission).
However, trouble occurs when users try to copy-and-paste (using the
mouse) the pathnames into applications. The pathnames are now riddled
with invisible zero-width space characters, and will not be accepted as
valid pathnames by applications.
So now it seems that I have a choice between two evils. Either ugly web
pages, or mysteriously unusable copy-and-paste.
Has this problem been discussed before in UTF-8 forums? I have no idea
how the copy-and-paste mechanism in Linux (or indeed anywhere else)
works. Would it be possible to specify (in the Unicode specifications)
that certain characters (like zero-width space, soft hyphen, maybe
others) have to be "uncopyable" or "unpastable", and would it be
possible to realise this technically? Or is this simply something that
should be called a bug in browsers?
Regards, Jan
I think copying the character is correct behavior. To achieve
line-breaking control without copy-n-paste artifacts, use CSS:
<style type="Text/css">
.word:after {
content: '\200b';
}
</style>
<span class="word">/foo</span><span class="word">/bar</span><span
class="word">/baz</span>
If you want the same behavior in msie, you'll need to insert <wbr />
elements at the end of each span.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/