Jan Willem Stumpel wrote:
Again, I apologise if this has been asked on this list before.

If I make a web page of which the text includes very long "words" (such "words" in my case often are file pathnames), often very awkward line-breaks result. They look even worse if the text is justified ("text-align: justify;" in .css).

It would be nice to allow such long "words" to be split at line breaks. UTF-8 provides a method for this: the zero-width space (U+200B, or as an HTML entity: "​").

OK, so I changed in my page all "/" characters occurring in pathnames to "/​". This indeed greatly improves the appearance of the HTML page. Instead of

   This     is     a      very     long      path       name:
   /etc/this/is/one/very/long/path/name

I get

   This is a very long path name: /etc/this/is/one/very/long/
   path/name

The overall visual impression is now much better.

(This is only an illustration, trying to approximate the effect on justified text, not involving actual zero width spaces; I hope it survives the e-mail transmission).

However, trouble occurs when users try to copy-and-paste (using the mouse) the pathnames into applications. The pathnames are now riddled with invisible zero-width space characters, and will not be accepted as valid pathnames by applications.

So now it seems that I have a choice between two evils. Either ugly web pages, or mysteriously unusable copy-and-paste.

Has this problem been discussed before in UTF-8 forums? I have no idea how the copy-and-paste mechanism in Linux (or indeed anywhere else) works. Would it be possible to specify (in the Unicode specifications) that certain characters (like zero-width space, soft hyphen, maybe others) have to be "uncopyable" or "unpastable", and would it be possible to realise this technically? Or is this simply something that should be called a bug in browsers?

Regards, Jan

I think copying the character is correct behavior. To achieve line-breaking control without copy-n-paste artifacts, use CSS:

 <style type="Text/css">
     .word:after {
         content: '\200b';
     }
 </style>
 <span class="word">/foo</span><span class="word">/bar</span><span
 class="word">/baz</span>

If you want the same behavior in msie, you'll need to insert <wbr /> elements at the end of each span.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to