Re: Proposing mostly invisible characters

2019-09-13 Thread Asmus Freytag via Unicode

  
  
On 9/13/2019 10:50 AM, Richard
  Wordingham via Unicode wrote:


  On Fri, 13 Sep 2019 08:56:02 +0300
Henri Sivonen via Unicode  wrote:


  
On Thu, Sep 12, 2019, 15:53 Christoph Päper via Unicode
 wrote:



  ISHY/SIHY is especially useful for encoding (German) noun compounds
in wrapped titles, e.g. on product labeling, where hyphens are often
suppressed for stylistic reasons, e.g. orthographically correct
_Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_
(U+2010) may be rendered as _Spargel␤Suppe_ and could then be
encoded as _SpargelSuppe_.
 



Why should this stylistic decision be encoded in the text content as
opposed to being a policy applies on the CSS (or conceptually
equivalent) layer?

  
  
How would you define such a property?

Richard.





We should start with whether such a
stylistic choice is general enough so that support in one or the
other standard is indicated.
Color me "not convinced" on that point.
If product names (or descriptions) are
wrapped in non-standard ways on products and in advertising that
may well be common in those instances, but they are like signage
and not running text. The designer will either use two text
boxes or use a fixed sized one an insert a space to get the
(typo-)graphical appearance desired.
Short of seeing this in a block of text on a
website where that block is resized with screen size or
resolution, I think we are arguing far ahead of an actual use
case.
  
A./
  


  



Re: Proposing mostly invisible characters

2019-09-13 Thread Richard Wordingham via Unicode
On Fri, 13 Sep 2019 08:56:02 +0300
Henri Sivonen via Unicode  wrote:

> On Thu, Sep 12, 2019, 15:53 Christoph Päper via Unicode
>  wrote:
> 
> > ISHY/SIHY is especially useful for encoding (German) noun compounds
> > in wrapped titles, e.g. on product labeling, where hyphens are often
> > suppressed for stylistic reasons, e.g. orthographically correct
> > _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_
> > (U+2010) may be rendered as _Spargel␤Suppe_ and could then be
> > encoded as _SpargelSuppe_.
> >  
> 
> Why should this stylistic decision be encoded in the text content as
> opposed to being a policy applies on the CSS (or conceptually
> equivalent) layer?

How would you define such a property?

Richard.




Re: Proposing mostly invisible characters

2019-09-13 Thread Asmus Freytag via Unicode

  
  
On 9/12/2019 5:53 AM, Christoph Päper
  via Unicode wrote:


  ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped titles, e.g. on product labeling, where hyphens are often suppressed for stylistic reasons, e.g. orthographically correct _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be rendered as _Spargel␤Suppe_ and could then be encoded as _SpargelSuppe_.

Can you provide examples where this happens
in text that is not fixed layout, that is, a product website,
rather than a product label? For fixed layout, you cannot, in
principle, know that there wasn't a regular space used (or two
separate text boxes, or any other means to get the effect). 
  
A./

  



Re: Proposing mostly invisible characters

2019-09-13 Thread Christoph Päper via Unicode
CSS Text would indeed allow this in level 4:

  .label {hyphenate-character: "";}



However, this suggests that *all* SHYs therein should not produce a hyphen 
glyph at the end of a line. I guess I would need to show then, that there are 
instances where this is not desired.

Am 13. Sep. 2019, 07:59, um 07:59, Henri Sivonen via Unicode 
 schrieb:
>On Thu, Sep 12, 2019, 15:53 Christoph Päper via Unicode
>
>wrote:
>
>> ISHY/SIHY is especially useful for encoding (German) noun compounds
>in
>> wrapped titles, e.g. on product labeling, where hyphens are often
>> suppressed for stylistic reasons, e.g. orthographically correct
>> _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010)
>may be
>> rendered as _Spargel␤Suppe_ and could then be encoded as
>> _SpargelSuppe_.
>>
>
>Why should this stylistic decision be encoded in the text content as
>opposed to being a policy applies on the CSS (or conceptually
>equivalent)
>layer?
>
>>


Re: Proposing mostly invisible characters

2019-09-12 Thread Henri Sivonen via Unicode
On Thu, Sep 12, 2019, 15:53 Christoph Päper via Unicode 
wrote:

> ISHY/SIHY is especially useful for encoding (German) noun compounds in
> wrapped titles, e.g. on product labeling, where hyphens are often
> suppressed for stylistic reasons, e.g. orthographically correct
> _Spargelsuppe_, _Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be
> rendered as _Spargel␤Suppe_ and could then be encoded as
> _SpargelSuppe_.
>

Why should this stylistic decision be encoded in the text content as
opposed to being a policy applies on the CSS (or conceptually equivalent)
layer?

>


Re: Proposing mostly invisible characters

2019-09-12 Thread Richard Wordingham via Unicode
On Thu, 12 Sep 2019 14:53:45 +0200 (CEST)
Christoph Päper via Unicode  wrote:

> Dear Unicoders
> 
> There are some characters that have no precedent in existing
> encodings and are also hard to attest directly from printed sources.
> Can one still make a solid case for encoding those in Unicode? 
> 
> I am thinking of characters that are either invisible (most of the
> time) or can become invisible under certain circumstances.
> - INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH)  
>   is *never* rendered as a hyphen,  
>   *but* the word it appears in is treated as if it contained one at
> its position. 

SOFT HYPHEN is supposed to be rendered in the manner appropriate to the
writing system, not necessarily like a HYPHEN.  In some writing
systems, such as, I gather, most very modern Lao writing systems, it
has no visual indication.  TUS claims that I was hallucinating when I
saw word wrapping hyphens in non-scriptio continua Pali in the Tai
Tham script in a Lao book.  (To put it less provocatively, one needs
user-level control of the rendering of soft hyphens.)

So, to make a proper case for INVISIBLE HYPHEN, you at least need
evidence of a contrast between soft-hyphen and an invisible hyphen.
Even then, you run the risk of being told that you should use a higher
level protocol which you will have to implement yourself.  Also, so
long as you don't need your text to be automatically split into words,
you can use ZWSP for the function.

Richard.



Proposing mostly invisible characters

2019-09-12 Thread Christoph Päper via Unicode
Dear Unicoders

There are some characters that have no precedent in existing encodings and are 
also hard
to attest directly from printed sources. Can one still make a solid case for 
encoding those in Unicode? 

I am thinking of characters that are either invisible (most of the time) or can 
become invisible under certain circumstances.

Precedence
--

- HYPHEN U+2010 is *always* rendered as a hyphen (i.e. a centered horizontal 
bar glyph),  
  which may look identical to Hyphen-Minus U+002D.

- SOFT HYPHEN (SHY) U+00AD is *only* rendered as a hyphen *when* it appears at 
the end of a line.

- At least four existing math operators are *never* rendered with a visible 
glyph  
  and only explicitly encode semantics where syntax is potentially ambiguous 
otherwise:

  * FUNCTION APPLICATION U+2061  
is used where no multiplication is implied,  
e.g. between an alphabetic function variable and an opening parenthesis: 
f(x).
  * INVISIBLE TIMES U+2062  
is used where multiplication by either TIMES U+00D7 or MIDDLE DOT U+00B7 is 
implied,  
e.g. between a number and an alphabetic variable, constant or parenthesis: 
2πr(a+b)
  * INVISIBLE SEPARATOR U+2063  
is used where enumeration by a COMMA U+002C or SEMICOLON U+003B (and 
possibly whitespace) is implied,  
e.g. between two single-letter variable indices: aᵢⱼ.
  * INVISIBLE PLUS U+2064  
is used where addition by PLUS SIGN U+002B is implied,  
e.g. between an integer and a vulgar fraction: 1⅔.

Suggestions
---

- INVERSE SOFT HYPHEN (ISHY) or SOFT INVISIBLE HYPHEN (SIHY)  
  is *always* rendered as a hyphen *unless* it appears at the end of a line. 

- INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH)  
  is *never* rendered as a hyphen,  
  *but* the word it appears in is treated as if it contained one at its 
position. 

- INVERSE SOFT COMMA (ISC) or SOFT INVISIBLE COMMA (SIC)  
  is *always* rendered as a comma *unless* it appears at the end of a line. 

- INVISIBLE OPEN PARENTHESIS (IOP) and INVISIBLE CLOSE PARENTHESIS (ICP)  
  *should not* be rendered with a visible glyph, but *may* be for inline 
fallback.

ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped 
titles, e.g. on product labeling, where hyphens are often suppressed for 
stylistic reasons, e.g. orthographically correct _Spargelsuppe_, 
_Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be rendered as 
_Spargel␤Suppe_ and could then be encoded as _SpargelSuppe_.

Like the existing invisible math operators, IHY/ZWH is used where the presence 
of its visible counterpart (i.e. HYPHEN) would be required syntactically (i.e. 
orthographically), but can be derived from context and convention (at least by 
human readers). This is useful for spell-checking, line-breaking etc., e.g. for 
words (commercial names in particular) with internal capital letters that would 
otherwise break orthographic rules and that should be broken at the of end a 
line without a hyphen added (i.e. like ISHY/SIHY, not SHY). This is very 
similar to ZERO-WIDTH SPACE (ZWSP) and WORD JOINER (WJ) indeed, except that 
ZWSP separates two words, where IHY/ZWH joins them into one, but unlike WJ 
still allows a line break.

ISC/SIC is particularly useful in wrapping table headers where a possible line 
break can take on the separating role of a comma.

IOP and ICP enclose mathematical expressions to override precedence of 
operators that would otherwise apply and they enclose textual annotation that 
should be displayed outside the normal row of characters, e.g. a sum in the 
numerator or denominator of a fraction and ruby/furigana pronunciation hints, 
respectively, that both *may* be rendered inline where advanced typographic 
functionality is unavailable and should then be parenthesized for clarity.