Vyacheslav Karamov wrote:
> Hi All!
> 
> I need to capture cite numbers, but I have an extra values. I need to 
> capture cites, not figures, chapters and so on.
> For example in "[see 9; Figure 7]",   only "9" i.e. citation number must 
> be captured.
> 
> 
> my $regex = qr
> {
>                 (?i)                                                    
> # Case-insensitive
>                     [\p{IsAlpha}\.\s]*                                
>     # Any number of letters, dots and/or spaces (greedy)
>                     (
>                         [\x{2022}\*]*                                    
> # Any number of bullet or asterisk characters
>                         [1-9]+                                        
>     # One or more digits 1-9
>                         \s*                                            
>     # Any number of spaces
>                         (?:
>                             \-|\,|through|and                            
> # Any number of dash, comma or "through" or "and"
>                         )*?
>                         \s*                                            
>     # Any number of spaces
>                     )+
>                     [\,\;]?                                            
>     # One or more comma or semicolon
>                     \s*                                                
>     # Any number of spaces
>                     (?:
>                         (?:
>                             figure | fig[s]?[\.]?? | table | box | 
> chapter | diagram | scheme | chart | plate | appendix | part | section | 
> footnote | [p]{1,2}\.?? |  page
>                         )
>                         \s*                                            
>     # Any number of spaces
>                         [1-9]+                                        
>     # One or more digits 1-9
>                     )*?
> }msx;
> my @vancouverCites =
> (
> "[5, Figure 3]",
> "[8, Chapter 60]",
> "[9 through 15, pp. 35 - 46]",
> "[11, pp. 37 Through 47]",
> "[see 1, 4]",
> "[e.g. 2, 5]",
> "[e.g. •2, ••5]",
> "[e.g. *2, **5]",
> "[for example 1,17]",
> "[2, 9]",
> );
> 
> foreach my $cite (@vancouverCites)
> {
>     my @matches = $cite =~ /$regex/g;
>     foreach my $arr (@matches)
>     {
>         print "$arr\n" if defined $arr;
>     }
> }
> 
> 
> Script output:
> 
> 5  
> 3  - wrong
> 8
> 6 - wrong. I don't understand why 6 instead of 60 was captured. 
> Actually, only 8 is correct
> 9
> 15
> 35  - wrong
> 46  - wrong
> 11
> 37  - wrong
> 47  - wrong
> 1
> 4
> 2
> 5
> •2
> ••5
> *2
> **5
> 1
> 17
> 2
> 9

Your immediate problem is because you have used the character class [1-9] twice
instead of [0-9]. However I think there may well be more problems with your
regex than that. Is your list a complete set of citations that you expect to
match? In which case why don't you try to match the square brackets?

HTH,

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to