[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2024-05-06 Thread Dave
Follow-up Comment #9, bug #62300 (group groff):

[comment #7 comment #7:]
> if the above is in fact the case, one of us should open a new bug report.

Bug #65693


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2024-04-27 Thread Bjarni Ingi Gislason
Follow-up Comment #8, bug #62300 (group groff):

  The glyph 'u00A0' is defined in "groff/build/font/devpdf/U-*"
except in  U-ZD:


U-AB:u00A0  280,0   0   690 uni00A0 --  
U-ABI:u00A0 280,0,0,0,500   690 uni00A0 --  
U-AI:u00A0  277,0,0,0,500   690 uni00A0 --  
U-AR:u00A0  277,0   0   690 uni00A0 --  
U-BMB:u00A0 340,0   0   690 uni00A0 --  
U-BMBI:u00A0340,0,0,0,500   690 uni00A0 --  
U-BMI:u00A0 300,0   0   690 uni00A0 --  
U-BMR:u00A0 320,0,0,0,500   690 uni00A0 --  
U-CB:u00A0  600,0   0   690 uni00A0 --  
U-CBI:u00A0 600,0,0,0,-336  0   690 uni00A0 --  
U-CI:u00A0  600,0,0,0,-269  0   690 uni00A0 --  
U-CR:u00A0  600,0   0   690 uni00A0 --  
U-HB:u00A0  278,0   0   690 uni00A0 --  
U-HBI:u00A0 278,0,0,17,-195,17  0   690 uni00A0 --  
U-HI:u00A0  278,0,0,0,-163  0   690 uni00A0 --  
U-HNB:u00A0 228,0   0   690 uni00A0 --  
U-HNBI:u00A0228,0,0,0,960   690 uni00A0 --  
U-HNI:u00A0 228,0,0,0,960   690 uni00A0 --  
U-HNR:u00A0 228,0   0   690 uni00A0 --  
U-HR:u00A0  278,0   0   690 uni00A0 --  
U-NB:u00A0  287,0   0   690 uni00A0 --  
U-NBI:u00A0 287,0,0,0,500   690 uni00A0 --  
U-NI:u00A0  278,0,0,0,500   690 uni00A0 --  
U-NR:u00A0  278,0   0   690 uni00A0 --  
U-PB:u00A0  250,0   0   690 uni00A0 --  
U-PBI:u00A0 250,0,0,0,-75   0   690 uni00A0 --  
U-PI:u00A0  250,0,0,0,-75   0   690 uni00A0 --  
U-PR:u00A0  250,0   0   690 uni00A0 --  
U-TB:u00A0  250,0   0   690 uni00A0 --  
U-TBI:u00A0 250,0,0,0,-75   0   690 uni00A0 --  
U-TI:u00A0  250,0,0,0,-75   0   690 uni00A0 --  
U-TR:u00A0  250,0   0   690 uni00A0 --  
U-ZCMI:u00A0220,0,0,0,500   690 uni00A0 --  




___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2024-04-27 Thread Dave
Follow-up Comment #7, bug #62300 (group groff):

[comment #6 comment #6:]
> the font defines operators for the glyph, which results in a
> space of a certain width.

This is a fixed width?  If so, such fonts provide an undocumented exception to
this statement in groff_char(7): "a no-break space... is mapped to \~, the
adjustable non-breaking space escape sequence."

This ticket being closed, if the above is in fact the case, one of us should
open a new bug report.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2024-04-27 Thread Deri James
Follow-up Comment #6, bug #62300 (group groff):

\[u00A0] is meaningful if you are using a font which defines the glyph. (I
appear to have about 67 fonts which define the glyph, some even define kern
pairs for the glyph, and one tibetan font defines composites using it).

Groff does not convert it to a horizontal motion and the font defines
operators for the glyph, which results in a space of a certain width.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2024-04-26 Thread Dave
Follow-up Comment #5, bug #62300 (group groff):

[comment #2 comment #2:]
> The input sequence '\[u00A0]' is _syntactically_ valid...but
> like '\[u]' and '\[u]', it's not _meaningful_

Dear future me: next time you run across this comment and think "I responded
to this somewhere" but can't remember where: it's in comment 13 of bug #58930.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2022-04-13 Thread Dave
Follow-up Comment #4, bug #62300 (project groff):

> Convert input U+00A0 to \~ as troff would, not to \[u00A0].

...or as troff _should_, were it not for bug #58962.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2022-04-12 Thread G. Branden Robinson
Update of bug #62300 (project groff):

  Status: In Progress => Fixed  
 Open/Closed:Open => Closed 
 Planned Release:None => 1.23.0 

___

Follow-up Comment #3:


commit a22ceaea8df79608cf31b8080565053c35a0c7cc
Author: G. Branden Robinson 
Date:   Tue Apr 12 13:19:19 2022 +1000

[preconv]: Fix Savannah #62300.

* src/preproc/preconv/preconv.cpp (unicode_entity): Convert input U+00A0
  to \~ as troff would, not to \[u00A0].

Fixes .



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2022-04-12 Thread G. Branden Robinson
Follow-up Comment #2, bug #62300 (project groff):

Hi Bjarni,

[comment #1 comment #1:]
> commit f47b7dd139525bf3b8b4fbe767c3a45816c8445a
> Author: Bjarni Ingi Gislason 
> Date:   Sat Nov 17 15:59:09 2018 +
> 
> The character \[u00A0] is not recognized
> 
>   The input character "no-break space" (' ', 0xA0) is mapped by "groff"
> to '\~' (groff_char(7)), but only the character name '\[char160]' is
> translated in the file "tmac/troffrc".

Yes.
 
>   The "preconv" translates the no-break space to the name '\[u00A0]'.

That was an error and is the subject of this ticket.
 
> diff --git a/tmac/troffrc b/tmac/troffrc
> index 1bd4aa8c9..8895a9a01 100644
> --- a/tmac/troffrc
> +++ b/tmac/troffrc
> @@ -33,10 +33,14 @@ troffrc!X100 troffrc!X100-12 troffrc!lj4 troff!lbp
troffrc!html troffrc!pdf
>  .
>  .\" Test whether we work under EBCDIC and map the no-breakable space
>  .\" character accordingly.
> -.do ie '\[char97]'a' \
> +.do ie '\[char97]'a' \{\
>  .do tr \[char160]\~
> -.el \
> +.do tr \[u00A0]\~
> +.\}
> +.el \{\
>  .do tr \[char65]\~
> +.do tr \[u0041]\~
> +.\}
>  .
>  .\" Set the hyphenation language to 'us'.
>  .do hla us
> 

I'm not sure I agree with this patch.  It's preconv's job to produce valid
(GNU) troff _input_.  It was not doing so.

The input sequence '\[u00A0]' is _syntactically_ valid...but like '\[u]'
and '\[u]', it's not _meaningful_, and should be warned about.

Here is the patch I have pending.


diff --git a/src/preproc/preconv/preconv.cpp
b/src/preproc/preconv/preconv.cpp
index 83feef8f7..b1027af17 100644
--- a/src/preproc/preconv/preconv.cpp
+++ b/src/preproc/preconv/preconv.cpp
@@ -404,9 +404,13 @@ unicode_entity(int u)
   if (u < 0x80)
 putchar(u);
   else {
-// Handle soft hyphen specially -- it is an input character only,
-// not a glyph.
-if (u == 0xAD) {
+// Handle no-break space and soft hyphen specially--they are input
+// characters only, not glyphs.  See groff_char(7).
+if (u == 0xA0) {
+  putchar('\\');
+  putchar('~');
+}
+else if (u == 0xAD) {
   putchar('\\');
   putchar('%');
 }



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2022-04-11 Thread Bjarni Ingi Gislason
Follow-up Comment #1, bug #62300 (project groff):

commit f47b7dd139525bf3b8b4fbe767c3a45816c8445a
Author: Bjarni Ingi Gislason 
Date:   Sat Nov 17 15:59:09 2018 +

The character \[u00A0] is not recognized

  The input character "no-break space" (' ', 0xA0) is mapped by "groff"
to '\~' (groff_char(7)), but only the character name '\[char160]' is
translated in the file "tmac/troffrc".

  The "preconv" translates the no-break space to the name '\[u00A0]'.

  Example:

.pl 3v
A\[char161]B\[char160]C
.br
.\".tr \[u00A0]\~
A\[u00A1]B\[u00A0]C

  or

echo ' ' | preconv

Signed-off-by: Bjarni Ingi Gislason 

diff --git a/tmac/troffrc b/tmac/troffrc
index 1bd4aa8c9..8895a9a01 100644
--- a/tmac/troffrc
+++ b/tmac/troffrc
@@ -33,10 +33,14 @@ troffrc!X100 troffrc!X100-12 troffrc!lj4 troff!lbp
troffrc!html troffrc!pdf
 .
 .\" Test whether we work under EBCDIC and map the no-breakable space
 .\" character accordingly.
-.do ie '\[char97]'a' \
+.do ie '\[char97]'a' \{\
 .  do tr \[char160]\~
-.el \
+.  do tr \[u00A0]\~
+.\}
+.el \{\
 .  do tr \[char65]\~
+.  do tr \[u0041]\~
+.\}
 .
 .\" Set the hyphenation language to 'us'.
 .do hla us



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should

2022-04-11 Thread G. Branden Robinson
URL:
  

 Summary: [preconv] does not handle U+00A0 (NBSP) as it should
 Project: GNU troff
Submitted by: gbranden
Submitted on: Tue 12 Apr 2022 03:14:52 AM UTC
Category: Preprocessor preconv
Severity: 3 - Normal
  Item Group: Incorrect behaviour
  Status: In Progress
 Privacy: Public
 Assigned to: gbranden
 Open/Closed: Open
 Discussion Lock: Any
 Planned Release: None

___

Details:

preconv handles the soft hyphen by translating it into an appropriate escape
sequence (\%), but does not do the same for the no-break space.  groff_char(7)
has long defined the semantics in these as input code points (for ISO
character encodings).


$ cat whaaa.man 
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240 160 A0      NO-BREAK SPACE
255 173 AD  ­   SOFT HYPHEN
.TE
$ xxd whaaa.man
: 2e54 4820 4953 4f5f 3838 3539 2d32 2037  .TH ISO_8859-2 7
0010: 2032 3031 342d 3130 2d30 3220 224c 696e   2014-10-02 "Lin
0020: 7578 2220 224c 696e 7578 2050 726f 6772  ux" "Linux Progr
0030: 616d 6d65 7227 7320 4d61 6e75 616c 220a  ammer's Manual".
0040: 2e54 530a 6c20 6c20 6c20 6320 6c70 2d31  .TS.l l l c lp-1
0050: 2e0a 3234 3009 3136 3009 4130 09c2 a009  ..240.160.A0
0060: 4e4f 2d42 5245 414b 2053 5041 4345 0a32  NO-BREAK SPACE.2
0070: 3535 0931 3733 0941 4409 c2ad 0953 4f46  55.173.ADSOF
0080: 5420 4859 5048 454e 0a2e 5445 0a T HYPHEN..TE.
$ groff -t -kz -man whaaa.man # groff 1.22.4
troff: whaaa.man:4: warning: can't find special character 'u00A0'
$ ./build/test-groff -ww -t -kz -man whaaa.man $ groff Git HEAD
troff:whaaa.man:4: warning: can't find special character 'u00A0'
$ preconv whaaa.man # groff 1.22.4 and Git HEAD
.lf 1 whaaa.man
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240 160 A0  \[u00A0]NO-BREAK SPACE
255 173 AD  \%  SOFT HYPHEN
.TE


preconv should put \~ on the output as documented in groff_char(7) even in
groff 1.22.4.


   160the ISO latin1 no‐break space is mapped to ‘\~’, the
  stretchable space character.

   173the soft hyphen control character.  groff never uses
  this character for output (thus it is omitted in the
  table below); the input character 173 is mapped onto
  ‘\%’.


This remapping should occur because the diagnostic itself is not the problem;
there are many Unicode code points that are not valid groff input; expressing
them as special character escape sequences does not change that fact.

Working on this.




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/