Preparation for use of XS paragraph formatting module

Gavin Smith Mon, 29 Jun 2015 10:29:25 -0700

Hi Patrice and anyone else who cares to comment,

As you may know I've been rewriting Paragraph.pm, the formatter module
for paragraphs, in C, to be used as a loadable XS module by Perl. Due
to Perl's slow text processing capabilities, paragraph formatting
takes up a sizable proportion of the run-time of makeinfo/texi2any
when outputting an Info file.


For comparison, here's the timing of a run using the Perl Paragraph.pm
on the sources of the Emacs Lisp manual (about 3.3 megs of Texinfo
source):

real    0m54.751s
user    0m46.124s
sys     0m0.266s

Now using the C replacement:

real    0m34.367s
user    0m29.865s
sys     0m0.267s

Although not complete, I don't expect these kinds of numbers to change
very much.

I hope that this XS module can be completed and integrated into
texi2any. If that can be done, it should be possible to replace other
parts of texi2any as well for speed (notably the parser module, which
is a much bigger job to rewrite).

I'm at the stage now where the choice of whether to use Paragraph.pm
or the XS module (I've called it XSParagraph) is made by commenting
out a single line in Plaintext.pm. This works for running texi2any.pl
from within the source directory: there will be more problems for
installing/distributing/etc. (probably needs libtool or something).

In order to make this possible, I've made preparatory changes to the
Perl modules, which I am attaching here for review. The changes relate
to the question of whether there should be one space after a full
stop, or two.

As you know, a capital letter before a full stop suppresses an end of
sentence. There is a complication with constructs like "@sc{a. b.}"
which should give the output "A.  B." and not "A. B.". Currently
texi2any deals with this with a concept of "underlying text": when
formatting "A. B." it looks at a string like "a. b." to decide if it
is at the end of a sentence.

I've found this use of underlying text hard to understand when reading
the code. I didn't want to write the C code to process underlying text
along with the main text, and also there may be performance
implications in doing things twice. So I've changed the code to use a
different approach. This is to insert a marker character, that will
not appear in the output, before a ., ? or ! which is allowed to
terminate a sentence in spite of a preceding upper-case letter. This
might seem like a hack, but it won't cause any problems because the
marker character used won't be passed in the argument otherwise, and
it was easy to implement the interpretation of this in XSParagraph.

I acknowledge that this is a big patch to look at. The most
interesting part of it is the changes to Plaintext.pm, which
demonstrates the interface that the formatter modules now provide. If
anyone has time to have a look at this, or suggest what I'm missing,
it would be appreciated.

"make check" reports 2 failures with these changes, both for tests
which used add_underlying_text directly. When I switch to XSParagraph,
I get 3 failures: the 2 mentioned, plus one that had accent combining
characters in the output, which Paragraph.pm was assuming had width 1
(there were included in Perl code like "length($word)"), when actually
they had display width 0, leading to a line being wrapped differently.
Output looks like:

   *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j
ee͡:: *note ı Ḕ

when it should be

   *note ª º ★ £ ⊣ ¿ ®:: *note ⇒ ° a b a sunny day å:: *note Å æ œ Æ Œ ø
Ø ß ł Ł Ð ð Þ þ:: *note ä ẽ î â à é ç ē e̊ e̋ ę:: *note ė ĕ e̲ ẹ ě j ee͡::
*note ı Ḕ Ḉ

(Don't know how these will show up in the email...) This was in
t/results/converters_tests/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
and 
t/results/converters_tests/at_commands_in_refs_utf8/out_info/at_commands_in_refs_utf8.info

I'd like to make these changes now, although I will need to do more
work and testing on XSParagraph before it can be enabled by default.

Best wishes,
Gavin

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 6365)
+++ ChangeLog	(working copy)
@@ -1,3 +1,46 @@
+2015-??-??  Gavin Smith  <[email protected]>
+
+	* tp/Texinfo/Convert/Line.pm, tp/Texinfo/Convert/Paragraph.pm,
+	tp/Texinfo/Convert/UnFilled.pm: (allow_end_sentence): New function.
+
+	* tp/Texinfo/Convert/Line.pm, tp/Texinfo/Convert/Paragraph.pm,
+	tp/Texinfo/Convert/UnFilled.pm (_add_text, add_next, _add_next): 
+	Handle backspace as a marker to allow an end of sentence.
+
+	* tp/Texinfo/Convert/Plaintext.pm (_protect_sentence_ends): New 
+	function.
+	(_process_text): Don't return a pair the second element of which 
+	is the underlying text.  Instead, call _protect_sentence_ends on 
+	the text.  No special handing of @code or @var.  Caller in 
+	_convert updated.
+
+	(new_formatter): Add commented-out line to use XSParagraph 
+	instead of Texinfo::Convert::Paragraph.
+
+	(_count_added): Reinstate a commented-out use of end_line_count 
+	method.
+
+	(_convert): Remove check for 'underlying_text' element, which 
+	was only used for @acronym and @abbr.
+	<@acronym and @abbr>: Don't cause underlying text to be saved in 
+	the formatters.  Instead, call allow_end_sentence after 
+	converting the argument, and return the result of the 
+	conversion.
+	<close @var and close monospace>: Call allow_end_sentence method 
+	on formatter.
+	<Brace commands with no arguments> If command is not a single 
+	character, call allow_end_sentence after converting.  Call
+	allow_end_sentence if in @var or monospace.
+	<accent commands>: Don't pass underlying text to the formatters.
+	Always call allow_end_sentence in @var and monospace, and call 
+	it when in @sc and the original output would have been not have 
+	been an uppercase letter.
+
+	* tp/texi2any.pl (BEGIN) <in-source run> Add directories for 
+	XSParagraph to @INC.
+	* tp/Makefile.am (AM_T_LOG_FLAGS): Add -I flags for XSParagraph.
+
+
 2015-06-26  Gavin Smith  <[email protected]>
 
 	* README-hacking: Notes on how to tag source tree and update 
Index: tp/Makefile.am
===================================================================
--- tp/Makefile.am	(revision 6362)
+++ tp/Makefile.am	(working copy)
@@ -175,6 +175,9 @@ T_LOG_DRIVER = env AM_TAP_AWK='$(AWK)' $(SHELL) \
                        $(top_srcdir)/build-aux/tap-driver.sh
 T_LOG_COMPILER = $(PERL)
 AM_T_LOG_FLAGS = -w
+AM_T_LOG_FLAGS += -ITexinfo/Convert/XSParagraph/lib
+AM_T_LOG_FLAGS += -ITexinfo/Convert/XSParagraph/blib/arch
+
 AM_TESTS_ENVIRONMENT = srcdir="$(srcdir)"; export srcdir; top_srcdir="$(top_srcdir)"; export top_srcdir;
 
 # just a convenience for running these additional tests.
Index: tp/Texinfo/Convert/Line.pm
===================================================================
--- tp/Texinfo/Convert/Line.pm	(revision 6362)
+++ tp/Texinfo/Convert/Line.pm	(working copy)
@@ -166,6 +166,9 @@ sub add_next($;$$$$)
   return $line->_add_next($word, undef, $space, $end_sentence, $transparent);
 }
 
+my $end_sentence_character = quotemeta('.?!');
+my $after_punctuation_characters = quotemeta('"\')]');
+
 # add a word and/or spaces and end of sentence.
 sub _add_next($;$$$$$)
 {
@@ -180,6 +183,10 @@ sub _add_next($;$$$$$)
   $underlying_word = $word if (!defined($underlying_word));
 
   if (defined($word)) {
+    my $disinhibit; # full stop after capital letter ends sentence
+    if ($word =~ s/\x08$//) {
+      $disinhibit = 1;
+    }
     if (!defined($line->{'word'})) {
       $line->{'word'} = '';
       $line->{'underlying_word'} = '';
@@ -194,7 +201,18 @@ sub _add_next($;$$$$$)
       }
     }
     $line->{'word'} .= $word;
-    $line->{'underlying_word'} .= $underlying_word unless ($transparent);
+
+    if (!$transparent) {
+      if ($disinhibit) {
+        $line->{'underlying_word'} = 'a';
+      } elsif ($word =~
+           /([^$end_sentence_character$after_punctuation_characters])
+            [$end_sentence_character$after_punctuation_characters]*$/x) {
+        # Save the last character in $word before punctuation
+        $line->{'underlying_word'} = $1;
+      }
+    }
+
     if ($line->{'DEBUG'}) {
       print STDERR "WORD+.L $word -> $line->{'word'}\n";
       print STDERR "WORD+.L $underlying_word -> $line->{'underlying_word'}\n";
@@ -231,6 +249,12 @@ sub inhibit_end_sentence($)
   $line->{'end_sentence'} = 0;
 }
 
+sub allow_end_sentence($)
+{
+  my $line = shift;
+  $line->{'underlying_text'} = 'a'; # lower-case
+}
+
 sub set_space_protection($$;$$$)
 {
   my $line = shift;
@@ -261,9 +285,6 @@ sub set_space_protection($$;$$$)
   return '';
 }
 
-my $end_sentence_character = quotemeta('.?!');
-my $after_punctuation_characters = quotemeta('"\')]');
-
 # wrap a text.
 sub add_text($$;$)
 {
@@ -316,16 +337,32 @@ sub add_text($$;$)
       }
     } elsif ($text =~ s/^(([^\s\p{InFullwidth}]|[\x{202f}\x{00a0}])+)//) {
       my $added_word = $1;
-      $underlying_text =~ s/^(([^\s\p{InFullwidth}]|[\x{202f}\x{00a0}])+)//;
-      my $underlying_added_word = $1;
 
-      $result .= $line->_add_next($added_word, $underlying_added_word);
-      # now check if it is considered as an end of sentence
-      if (defined($line->{'end_sentence'}) and 
-        $added_word =~ /^[$after_punctuation_characters]*$/) {
-        # do nothing in the case of a continuation of after_punctuation_characters
-      } elsif ($line->{'underlying_word'} =~ /[$end_sentence_character][$after_punctuation_characters]*$/
-           and $line->{'underlying_word'} !~ /[[:upper:]][$end_sentence_character$after_punctuation_characters]*$/) {
+      # Whether a sentence end is permitted in spite of a preceding
+      # upper case letter.
+      my $disinhibit = 0;
+
+      # Reverse the insertion of the control character in Plaintext.pm.
+      if ($added_word =~ s/\x08(?=[$end_sentence_character]
+        [$after_punctuation_characters]*$)//x) {
+        $disinhibit = 0;
+      }
+      $result .= _add_next($line, $added_word);
+
+      my $last_letter = $line->{'underlying_word'};
+
+      # Check if it is considered as an end of sentence.  There are two things
+      # to check: one, that we have a ., ! or ?; and second, that it is not
+      # preceded by an upper-case letter (ignoring some punctuation)
+      if (defined($line->{'end_sentence'})
+          and $added_word =~ /^[$after_punctuation_characters]*$/) {
+        # do nothing in the case of a continuation of 
+        # after_punctuation_characters
+      } elsif (($disinhibit
+                or !$last_letter
+                or $last_letter !~ /[[:upper:]]/)
+              and $added_word =~ /[$end_sentence_character]
+                                  [$after_punctuation_characters]*$/x) {
         if ($line->{'frenchspacing'}) {
           $line->{'end_sentence'} = -1;
         } else {
Index: tp/Texinfo/Convert/Paragraph.pm
===================================================================
--- tp/Texinfo/Convert/Paragraph.pm	(revision 6362)
+++ tp/Texinfo/Convert/Paragraph.pm	(working copy)
@@ -168,6 +168,9 @@ sub end($)
   return $result;
 }
 
+my $end_sentence_character = quotemeta('.?!');
+my $after_punctuation_characters = quotemeta('"\')]');
+
 sub add_next($;$$$$)
 {
   my $paragraph = shift;
@@ -176,25 +179,27 @@ sub add_next($;$$$$)
   my $end_sentence = shift;
   my $transparent = shift;
   $paragraph->{'end_line_count'} = 0;
-  return _add_next($paragraph, $word, undef, $space, $end_sentence, 
+  return _add_next($paragraph, $word, $space, $end_sentence, 
                    $transparent);
 }
 
 # add a word and/or spaces and end of sentence.
 sub _add_next($;$$$$$$)
 {
-  my $paragraph = $_[0];
-  my $word = $_[1];
-  my $space = $_[3];
-  my $end_sentence = $_[4];
-  my $transparent = $_[5];
+  my $paragraph = shift;
+  my $word = shift;
+  my $space = shift;
+  my $end_sentence = shift;
+  my $transparent = shift;
+  my $newlines_impossible = shift;
   my $result = '';
 
   if (defined($word)) {
-    my $underlying_word = $_[2];
-    my $newlines_impossible = $_[6];
-    $underlying_word = $word if (!defined($underlying_word));
-
+    my $disinhibit = 0;
+    # Reverse the insertion of the control character in Plaintext.pm.
+    if ($word =~ s/\x08$//) {
+      $disinhibit = 1;
+    }
     if (!defined($paragraph->{'word'})) {
       $paragraph->{'word'} = '';
       $paragraph->{'underlying_word'} = '';
@@ -212,7 +217,18 @@ sub _add_next($;$$$$$$)
     }
     
     $paragraph->{'word'} .= $word;
-    $paragraph->{'underlying_word'} .= $underlying_word unless($transparent);
+
+    if (!$transparent) {
+      if ($disinhibit) {
+        $paragraph->{'underlying_word'} = 'a';
+      } elsif ($word =~
+           /([^$end_sentence_character$after_punctuation_characters])
+            [$end_sentence_character$after_punctuation_characters]*$/x) {
+        # Save the last character in $word before punctuation
+        $paragraph->{'underlying_word'} = $1;
+      }
+    }
+
     if (!$newlines_impossible and $word =~ /\n/) {
       $result .= $paragraph->{'space'};
       $paragraph->{'space'} = '';
@@ -229,13 +245,7 @@ sub _add_next($;$$$$$$)
       if (defined($paragraph->{'word'})) {
         $para_word = $paragraph->{'word'};
       }
-      my $para_underlying_word = 'UNDEF';;
-      if (defined($paragraph->{'underlying_word'})) {
-        $para_underlying_word = $paragraph->{'word'};
-      }
-
       print STDERR "WORD+ $word -> $para_word\n";
-      print STDERR "UNDERLYING_WORD+ $underlying_word -> $para_underlying_word\n";
     }
     # The $paragraph->{'counter'} != 0 is here to avoid having an
     # additional line output when the text is longer than the max.
@@ -280,6 +290,13 @@ sub inhibit_end_sentence($)
   $paragraph->{'end_sentence'} = 0;
 }
 
+sub allow_end_sentence($)
+{
+  my $paragraph = shift;
+  printf STDERR "ALLOW END SENTENCE\n" if $paragraph->{'DEBUG'};
+  $paragraph->{'underlying_word'} = 'a'; # lower-case
+}
+
 sub set_space_protection($$;$$$)
 {
   my $paragraph = shift;
@@ -310,9 +327,6 @@ sub set_space_protection($$;$$$)
   return '';
 }
 
-my $end_sentence_character = quotemeta('.?!');
-my $after_punctuation_characters = quotemeta('"\')]');
-
 # wrap a text.
 sub add_text($$;$)
 {
@@ -347,13 +361,11 @@ sub add_text($$;$)
     }
     # \x{202f}\x{00a0} are non breaking spaces
     if (defined $spaces) {
-      $underlying_text =~ s/^([^\S\x{202f}\x{00a0}]+)//
-        if defined($underlying_text);
       print STDERR "SPACES($paragraph->{'counter'}) `"._print_escaped_spaces($spaces)."'\n" if $debug_flag;
       #my $added_word = $paragraph->{'word'};
       if ($protect_spaces_flag) {
         $paragraph->{'word'} .= $spaces;
-        $paragraph->{'underlying_word'} .= $spaces;
+        $paragraph->{'underlying_word'} = substr($spaces, -1);
         $paragraph->{'word_counter'} += length($spaces);
         #$paragraph->{'space'} .= $spaces;
         if ($paragraph->{'word'} =~ s/\n/ /g 
@@ -362,15 +374,9 @@ sub add_text($$;$)
            and $paragraph->{'end_sentence'} > 0) {
           $paragraph->{'word'} =~ /(\s*)$/;
           if (length($1) < 2) {
-            #$paragraph->{'word'} =~ s/(\s*)$/  /;
-            #$paragraph->{'underlying_word'} =~ s/(\s*)$/  /;
-            #my $removed = $1;
-            #$paragraph->{'word_counter'} += length('  ') - length($removed);
             my $added = ' ' x (2 - length($1));
             $paragraph->{'word'} .= $added;
-            $paragraph->{'word'} =~ /(\s*)$/;
-            my $end_spaces = $1;
-            $paragraph->{'underlying_word'} =~ s/(\s*)$/$end_spaces/;
+            $paragraph->{'underlying_word'} = ' ';
             $paragraph->{'word_counter'} += length($added);
           }
         }
@@ -422,23 +428,32 @@ sub add_text($$;$)
         $result .= _end_line($paragraph);
       }
     } elsif (defined $added_word) {
-      my $underlying_added_word;
-      if (defined($underlying_text)) {
-        $underlying_text =~ s/^(([^\s\p{InFullwidth}]|[\x{202f}\x{00a0}])+)//;
-        $underlying_added_word = $1;
-      } else {
-        $underlying_added_word = $added_word;
+      # Whether a sentence end is permitted in spite of a preceding
+      # upper case letter.
+      my $disinhibit = 0;
+
+      # Reverse the insertion of the control character in Plaintext.pm.
+      if ($added_word =~ s/\x08(?=[$end_sentence_character]
+                                  [$after_punctuation_characters]*$)//x) {
+        $disinhibit = 1;
       }
 
-      $result .= _add_next($paragraph, $added_word, $underlying_added_word,
-                           undef, undef, undef, !$newline_possible_flag);
+      $result .= _add_next($paragraph, $added_word, undef, undef,
+                           undef, !$newline_possible_flag);
 
-      # now check if it is considered as an end of sentence
+      my $last_letter = $paragraph->{'underlying_word'};
+
+      # Check if it is considered as an end of sentence.  There are two things
+      # to check: one, that we have a ., ! or ?; and second, that it is not
+      # preceded by an upper-case letter (ignoring some punctuation)
       if (defined($paragraph->{'end_sentence'})
-          and $underlying_added_word =~ /^[$after_punctuation_characters]*$/o) {
+          and $added_word =~ /^[$after_punctuation_characters]*$/o) {
         # do nothing in the case of a continuation of after_punctuation_characters
-      } elsif ($paragraph->{'underlying_word'} =~ /[$end_sentence_character][$after_punctuation_characters]*$/o
-           and $paragraph->{'underlying_word'} !~ /[[:upper:]][$end_sentence_character$after_punctuation_characters]*$/o) {
+      } elsif (($disinhibit
+                or !$last_letter
+                or $last_letter !~ /[[:upper:]]/)
+              and $added_word =~ /[$end_sentence_character]
+                                  [$after_punctuation_characters]*$/x) {
         if ($paragraph->{'frenchspacing'}) {
           $paragraph->{'end_sentence'} = -1;
         } else {
Index: tp/Texinfo/Convert/Plaintext.pm
===================================================================
--- tp/Texinfo/Convert/Plaintext.pm	(revision 6362)
+++ tp/Texinfo/Convert/Plaintext.pm	(working copy)
@@ -31,6 +31,10 @@ use Texinfo::Convert::Paragraph;
 use Texinfo::Convert::Line;
 use Texinfo::Convert::UnFilled;
 
+use XSParagraph;
+XSParagraph::hello ();
+# TODO: Run initialization code for XSParagraph implicitly.
+
 use Carp qw(cluck);
 
 require Exporter;
@@ -54,7 +58,7 @@ use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPO
 @EXPORT = qw(
 );
 
-$VERSION = '5.1.90';
+$VERSION = '6.0';
 
 # misc commands that are of use for formatting.
 my %formatting_misc_commands = %Texinfo::Convert::Text::formatting_misc_commands;
@@ -551,6 +555,34 @@ sub _output_old($$)
   return undef;
 }
 
+my $end_sentence = quotemeta('.?!');
+my $after_punctuation = quotemeta('"\')]');
+
+sub _protect_sentence_ends ($) {
+  my $text = shift;
+  # Avoid suppressing end of sentence, by inserting a control character
+  # in front of the full stop.  The choice of BS for this is arbitrary.
+  $text =~ s/(?<=[^[:upper:]])
+             (?=[$end_sentence][$after_punctuation]*(?:\s|$))
+             /\x08/gx;
+
+  # Also insert a control character at end of string, to protect a full stop 
+  # that may follow later.
+
+  #$text =~ s/(?<=[^[:upper:]][$after_punctuation]*)$/\x08/;
+  # Perl doesn't support "variable length lookbehind"
+
+  $text = reverse $text;
+  $text =~ s/^(?=[$after_punctuation]*
+                 (?:[^[:upper:]\s]|[\x{202f}\x{00a0}]))
+            /\x08/x;
+  $text = reverse $text;
+
+  return $text;
+}
+
+# Convert ``, '', `, ', ---, -- in $COMMAND->{'text'} to their output,
+# possibly coverting to upper case as well.
 sub _process_text($$$)
 {
   my $self = shift;
@@ -558,30 +590,22 @@ sub _process_text($$$)
   my $context = shift;
   my $text = $command->{'text'};
 
+  if ($context->{'upper_case'}
+      or $self->{'formatters'}[-1]->{'var'}) {
+    $text = _protect_sentence_ends($text);
 
-  my $lower_case_text;
-  if ($context->{'upper_case'}) {
-    $lower_case_text = $text;
+    if ($self->{'debug'}) {
+      my $debug_text = $text;
+      $debug_text =~ s/\x08/!!/g;
+      print STDERR "markers:<$debug_text>\n";
+    }
+
     $text = uc($text);
   }
-  # Even if in upper case, in code style or @var always end a sentence.
-  if (#$context->{'code'} 
-      $context->{'font_type_stack'}->[-1]->{'monospace'}
-      or $context->{'var'}) {
-    $lower_case_text = lc($text);
-  }
+
   if ($self->{'to_utf8'}) {
-    if (defined($lower_case_text)) {
-      $lower_case_text 
-        = Texinfo::Convert::Unicode::unicode_text($lower_case_text, 
-          #$context->{'code'});
-          $context->{'font_type_stack'}->[-1]->{'monospace'});
-    }
-    return (Texinfo::Convert::Unicode::unicode_text($text, 
-            $context->{'font_type_stack'}->[-1]->{'monospace'}),
-            #$context->{'code'}),
-            $lower_case_text);
-  #} elsif (!$context->{'code'}) {
+    return Texinfo::Convert::Unicode::unicode_text($text, 
+            $context->{'font_type_stack'}->[-1]->{'monospace'});
   } elsif (!$context->{'font_type_stack'}->[-1]->{'monospace'}) {
     $text =~ s/---/\x{1F}/g;
     $text =~ s/--/-/g;
@@ -589,16 +613,8 @@ sub _process_text($$$)
     $text =~ s/``/"/g;
     $text =~ s/\'\'/"/g;
     $text =~ s/`/'/g;
-    if (defined($lower_case_text)) {
-      $lower_case_text =~ s/---/\x{1F}/g;
-      $lower_case_text =~ s/--/-/g;
-      $lower_case_text =~ s/\x{1F}/--/g;
-      $lower_case_text =~ s/``/"/g;
-      $lower_case_text =~ s/\'\'/"/g;
-      $lower_case_text =~ s/`/'/g;
-    }
   }
-  return ($text, $lower_case_text);
+  return $text;
 }
 
 sub new_formatter($$;$)
@@ -641,6 +657,7 @@ sub new_formatter($$;$)
     $container = Texinfo::Convert::Line->new($container_conf);
   } elsif ($type eq 'paragraph') {
     $container = Texinfo::Convert::Paragraph->new($container_conf);
+    #$container = XSParagraph->new($container_conf);
   } elsif ($type eq 'unfilled') {
     $container = Texinfo::Convert::UnFilled->new($container_conf);
   } else {
@@ -786,8 +803,7 @@ sub _count_added($$$)
   my $container = shift;
   my $text = shift;
 
-  #$self->_add_lines_count($container->end_line_count());
-  $self->{'count_context'}->[-1]->{'lines'} += $container->{'end_line_count'};
+  $self->_add_lines_count($container->end_line_count());
 
   #$self->_add_text_count($text);
   #$self->{'count_context'}->[-1]->{'bytes'} +=
@@ -1640,12 +1656,10 @@ sub _convert($$)
                                or $root->{'type'} eq 'last_raw_newline')) {
         $result = _count_added($self, $formatter->{'container'},
                     $formatter->{'container'}->add_next($root->{'text'}));
-      } elsif ($root->{'type'} and ($root->{'type'} eq 'underlying_text')) {
-        $formatter->{'container'}->add_underlying_text($root->{'text'});
       } else {
-        my ($text, $lower_case_text) = _process_text($self, $root, $formatter);
+        my $text = _process_text($self, $root, $formatter);
         $result = _count_added($self, $formatter->{'container'},
-                    $formatter->{'container'}->add_text($text, $lower_case_text));
+                    $formatter->{'container'}->add_text($text));
       }
       return $result;
     # the following is only possible if paragraphindent is set to asis
@@ -1754,19 +1768,22 @@ sub _convert($$)
       my $text;
       
       $text = Texinfo::Convert::Text::brace_no_arg_command($root, 
-                            {%{$self->{'convert_text_options'}}, 
-                             'sc' => $formatter->{'upper_case'}});
-      my $lower_case_text;
-      # always double spacing, so set underlying text lower case.
-      if ($formatter->{'var'} 
-          or $formatter->{'font_type_stack'}->[-1]->{'monospace'}) {
-        $lower_case_text = Texinfo::Convert::Text::brace_no_arg_command($root,
-                             {%{$self->{'convert_text_options'}},
-                              'lc' => 1});
-      } elsif ($formatter->{'upper_case'}) {
-        $lower_case_text = Texinfo::Convert::Text::brace_no_arg_command($root,
-                             $self->{'convert_text_options'});
+                                         $self->{'convert_text_options'});
+
+      # @AA{} should suppress an end sentence, @aa{} shouldn't.  This
+      # is the case whether we are in @sc or not.
+      if ($formatter->{'upper_case'}
+          and $letter_no_arg_commands{$root->{'cmdname'}}) {
+        $text = _protect_sentence_ends($text);
+        $text = uc($text);
+
+        if ($self->{'DEBUG'}) {
+          my $debug_text = $text;
+          $debug_text =~ s/\x08/!!/g;
+          print STDERR "accent markers:$debug_text\n";
+        }
       }
+
       if ($punctuation_no_arg_commands{$command}) {
         $result .= _count_added($self, $formatter->{'container'},
                     $formatter->{'container'}->add_next($text, undef, 1));
@@ -1776,25 +1793,28 @@ sub _convert($$)
             $formatter->{'container'}->set_space_protection(1,undef))
           if ($formatter->{'w'} == 1);
         $result .= _count_added($self, $formatter->{'container'}, 
-                       $formatter->{'container'}->add_text($text,
-                                                           $lower_case_text)); 
+                       $formatter->{'container'}->add_text($text));
         $formatter->{'w'}--;
         $result .= _count_added($self, $formatter->{'container'},
             $formatter->{'container'}->set_space_protection(0,undef))
           if ($formatter->{'w'} == 0);
       } else {
-        # This is to have @TeX{}, for example, be considered as tex
-        # as underlying text in order not to prevent end sentences.
+        $result .= _count_added($self, $formatter->{'container'}, 
+                       $formatter->{'container'}->add_text($text));
+
+        # This is to have @TeX{}, for example, not to prevent end sentences.
         if (!$letter_no_arg_commands{$command}) {
-          $lower_case_text = lc($text);
+          $formatter->{'container'}->allow_end_sentence();
         }
-        $result .= _count_added($self, $formatter->{'container'}, 
-                       $formatter->{'container'}->add_text($text,
-                                                           $lower_case_text)); 
+
         if ($command eq 'dots') {
           $formatter->{'container'}->inhibit_end_sentence();
         }
       }
+      if ($formatter->{'var'} 
+          or $formatter->{'font_type_stack'}->[-1]->{'monospace'}) {
+        $formatter->{'container'}->allow_end_sentence();
+      }
       return $result;
     # commands with braces
     } elsif ($accent_commands{$root->{'cmdname'}}) {
@@ -1808,18 +1828,22 @@ sub _convert($$)
       }
       my $accented_text 
          = Texinfo::Convert::Text::text_accents($root, $encoding, $sc);
-      my $accented_text_lower_case;
-      if ($formatter->{'var'} 
-          or $formatter->{'font_type_stack'}->[-1]->{'monospace'}) {
-        $accented_text_lower_case
-         = Texinfo::Convert::Text::text_accents($root, $encoding, -1);
-      } elsif ($formatter->{'upper_case'}) {
-        $accented_text_lower_case
+      $result .= _count_added($self, $formatter->{'container'},
+         $formatter->{'container'}->add_text($accented_text));
+
+      my $accented_text_original;
+      if ($formatter->{'upper_case'}) {
+        $accented_text_original
          = Texinfo::Convert::Text::text_accents($root, $encoding);
       }
-      $result .= _count_added($self, $formatter->{'container'},
-         $formatter->{'container'}->add_text($accented_text, 
-                                             $accented_text_lower_case));
+
+      if ($accented_text_original
+            and $accented_text_original !~ /[[:upper:]]/
+          or $formatter->{'var'} 
+          or $formatter->{'font_type_stack'}->[-1]->{'monospace'}) {
+        $formatter->{'container'}->allow_end_sentence();
+      }
+
       # in case the text added ends with punctuation.  
       # If the text is empty (likely because of an error) previous 
       # punctuation will be cancelled, we don't want that.
@@ -1907,6 +1931,7 @@ sub _convert($$)
       if ($code_style_commands{$command}) {
         #$formatter->{'code'}--;
         $formatter->{'font_type_stack'}->[-1]->{'monospace'}--;
+        $formatter->{'container'}->allow_end_sentence();
         pop @{$formatter->{'font_type_stack'}}
           if !$formatter->{'font_type_stack'}->[-1]->{'monospace'};
       } elsif ($regular_font_style_commands{$command}) {
@@ -1929,7 +1954,11 @@ sub _convert($$)
       }
       if ($upper_case_commands{$command}) {
         $formatter->{'upper_case'}--;
-        $formatter->{'var'}-- if ($command eq 'var');
+        if ($command eq 'var') {
+          $formatter->{'var'}--;
+          # Allow a following full stop to terminate a sentence.
+          $formatter->{'container'}->allow_end_sentence();
+        }
       }
       return $result;
     } elsif ($root->{'cmdname'} eq 'image') {
@@ -2207,13 +2236,13 @@ sub _convert($$)
                              $root->{'extra'}->{'brace_command_contents'}->[-1]});
           #print STDERR "".Data::Dumper->Dump([$prepended])."\n";
           unshift @{$self->{'current_contents'}->[-1]}, $prepended;
+          return '';
         } else {
-          # FIXME The underlying_text added is very ugly.  It leads to 'a'
-          # being prepended in the underlying word after the abbr or acronym,
-          # the intended effect being that a following period is always
-          # interpreted as ending a sentence.
-          unshift @{$self->{'current_contents'}->[-1]}, ($argument,
-                    {'type' => 'underlying_text', 'text' => 'a'});
+          $result = $self->_convert($argument);
+
+          # We want to permit an end of sentence, but not force it as @. does.
+          $formatter->{'container'}->allow_end_sentence();
+          return $result;
         }
       }
       return '';
Index: tp/Texinfo/Convert/UnFilled.pm
===================================================================
--- tp/Texinfo/Convert/UnFilled.pm	(revision 6362)
+++ tp/Texinfo/Convert/UnFilled.pm	(working copy)
@@ -88,6 +88,7 @@ sub _add_text($$)
 {
   my $line = shift;
   my $text = shift;
+  $text =~ s/\x08//g;
   if ($line->{'line_beginning'}) {
     if ($line->{'indent_length'}) {
       my $nspaces = $line->{'indent_length'} - $line->{'counter'};
@@ -176,6 +177,11 @@ sub inhibit_end_sentence($)
   my $line = shift;
 }
 
+sub allow_end_sentence($)
+{
+  my $line = shift;
+}
+
 sub set_space_protection($$;$$$)
 {
   return '';
Index: tp/texi2any.pl
===================================================================
--- tp/texi2any.pl	(revision 6362)
+++ tp/texi2any.pl	(working copy)
@@ -87,6 +87,10 @@ BEGIN
     # cause trouble if the modules are separately installed.
     push @INC, $texinfolibdir;
   }
+  push @INC, "${texinfolibdir}Texinfo/Convert/XSParagraph/lib";
+  push @INC, "${texinfolibdir}Texinfo/Convert/XSParagraph/blib/arch";
+  push @INC, "${texinfolibdir}Texinfo/Convert/XSParagraph";
+  use ExtUtils::testlib;
 
   # '@USE_EXTERNAL_LIBINTL @ and similar are substituted in the
   # makefile using values from configure

Preparation for use of XS paragraph formatting module

Reply via email to