markus schnalke <mei...@marmaro.de> writes:
>>     /usr/share/man/man1/rcsintro.1.gz:.TH RCSINTRO 1 \*(Dt GNU
>>     /usr/share/man/man1/saidar.1.gz:.TH saidar 1 $Date:\ 2006/11/30\ 
>> 23:42:42\ $ i\-scream 
>> 
> The last line is such a case.

Handled n the patch.

> If you parse it char for char, then you can parse it

I meant thet You can't read information from space delimited text, where
the information means different things. It needs a quote to say BEGIN
and quote to say END for:

    NAME SECTION DATE VERSION MANUAL

> The most important thing is detecting the first two parameters

> ... First detect the first two arguments, which will succeed almost
> always.

Added final ELSIF case. Daniel, use this.

Jari

>From 5675160c2b879b9d4b9b29e16224a8090ce32b0a Mon Sep 17 00:00:00 2001
From: Jari Aalto <jari.aa...@cante.net>
Date: Fri, 4 Jun 2010 10:12:23 +0300
Subject: [PATCH] roffit: improve TH handling
Organization: Private
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit


Signed-off-by: Jari Aalto <jari.aa...@cante.net>
---
 roffit |   52 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/roffit b/roffit
index 3149f37..ae55406 100755
--- a/roffit
+++ b/roffit
@@ -203,23 +203,61 @@ sub parsefile {
             $out = "";
             
             # cut off initial spaces
-            $rest =~ s/^ +//g;
+            $rest =~ s/^\s+//;
             
-            if($keyword eq "\\\"") {
+            if ( $keyword eq q(\\") ) {
                 # this is a comment, skip this line
             }
-            elsif($keyword =~ /^TH$/) {
+            elsif ( $keyword eq "TH" ) {
                 # man page header:
                 # curl 1 "22 Oct 2003" "Curl 7.10.8" "Curl Manual"
+
+		# Treat pages that have "*(Dt":
+		# .TH IDENT 1 \*(Dt GNU
+
+		$rest =~ s,\Q\\*(Dt,,g;
+
+		# Delete backslashes
+
+		$rest =~ s,\\,,g;
+
+		# Delete old RCS tags
+		# .TH saidar 1 $Date:\ 2006/11/30\ 23:42:42\ $ i\-scream
+
+		$rest =~ s,\$Date:\s+(.*?)\s+\$,$1,g;
+
                 # NAME SECTION DATE VERSION MANUAL
-                if($rest =~ /([^ ]*) (\d+) \"([^\"]*)\" \"([^\"]*)\"(\"([^\"]*)\")?/) {
+		# section can be: 1 or 3C
+
+                if ( $rest =~ /(\S+)\s+\"?(\d\S?+)\"?\s+\"([^\"]*)\" \"([^\"]*)\"(\"([^\"]*)\")?/ ) {
                     # strict matching only so far
-                    $manpage{'name'} = $1;
+                    $manpage{'name'}    = $1;
                     $manpage{'section'} = $2;
-                    $manpage{'date'} = $3;
+                    $manpage{'date'}    = $3;
                     $manpage{'version'} = $4;
-                    $manpage{'manual'} = $6;
+                    $manpage{'manual'}  = $6;
                 }
+	        # .TH html2text 1 2008-09-20 HH:MM:SS
+		elsif ( $rest =~  m, (\S+) \s+ \"?(\d\S?+)\"? \s+ \"?([ \d:/-]+)\"? \s* (.*) ,x )
+		{
+                    $manpage{'name'}    = $1;
+                    $manpage{'section'} = $2;
+                    $manpage{'date'}    = $3;
+                    $manpage{'manual'}  = $4;
+		}
+		# .TH program 1 description
+		elsif ( $rest =~ /(\S+) \s+ \"?(\d\S?+)\"? \s+ (.+)/x )
+		{
+                    $manpage{'name'}    = $1;
+                    $manpage{'section'} = $2;
+                    $manpage{'manual'}  = $3;
+		}
+		# .TH program 1
+		elsif ( $rest =~ /(\S+) \s+ \"?(\d\S?+)\"? /x )
+		{
+                    $manpage{'name'}    = $1;
+                    $manpage{'section'} = $2;
+		}
             }
             elsif($keyword =~ /^S[HS]$/) {
                 # SS is treated the same as SH
-- 
1.7.1

Reply via email to