Re: regex

2024-01-24 Thread karl
Mike:
> I stand properly scolded.

I didn't want to scold anyone, it seems I expressed myself wrong.
Sorry for that.

Regards,
/Karl Hammar


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex

2024-01-23 Thread Mike



I stand properly scolded.


Mike


On 1/23/24 07:01, k...@aspodata.se wrote:

Please stop using my mail address when replying, I'm on the list and
don't want two copies of the same mail (it's not about you Mike).



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex

2024-01-23 Thread karl
Please stop using my mail address when replying, I'm on the list and
don't want two copies of the same mail (it's not about you Mike).

Mike 
> Why is my Perl not working on that command?
> 
> $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
> Unescaped left brace in regex is illegal here in regex; marked by <-- 
> HERE in m/a{ <-- HERE ,2}/ at -e line 1.
> $
> 
> But this works:
> $ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'
> $
> 
> $ echo $?
> 10
> $

 On an old debian woody box I get:
$ perl -v | grep v5
This is perl, v5.6.1 built for i386-linux
$ perl -e 'exit(10) if "aaa"=~/a{,2}/;'; echo $?
0
$ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'; echo $?
10

$ man perlre
...
   The following standard quantifiers are recognized:

   *  Match 0 or more times
   +  Match 1 or more times
   ?  Match 1 or 0 times
   {n}Match exactly n times
   {n,}   Match at least n times
   {n,m}  Match at least n but not more than m times
...

 So, old perl versions don't have the {,m} quantifier, check your 
documentation for that. The easy way out is to always use {0,m} instead 
of {,m}, which is the same thing in modern perl, actually there is no
need ever to use the {,m} quantifier.

I don't know why I don't get a perl error message above, maybe a bug.

///

 On a more uptodate system I get:
$ perl -v | grep v5
This is perl 5, version 34, subversion 1 (v5.34.1) built for 
x86_64-linux-thread-multi
$ perl -e 'exit(10) if "aaa"=~/a{,2}/;'; echo $?
10
$ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'; echo $?
10

///

 If you are interested of the syntax rules, check under "Simple 
statements" in:

 (perl 5.6.1)
$ man perlsyn
   Any simple statement may optionally be followed by a SIN-
   GLE modifier, just before the terminating semicolon (or
   block ending).  The possible modifiers are:

   if EXPR
   unless EXPR
   while EXPR
   until EXPR
   foreach EXPR

...

 (perl 5.34.1)
$ man perlsyn
...
   Statement Modifiers
   Any simple statement may optionally be followed by a SINGLE modifier,
   just before the terminating semicolon (or block ending).  The possible
   modifiers are:

   if EXPR
   unless EXPR
   while EXPR
   until EXPR
   for LIST
   foreach LIST
   when EXPR

...

So, modern perl also have "for" and "when".

///

Also note that in a compound statement you have to ()'ize the EXPR as in

 if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK

in contrast to for the modifier you don't need to:

 STATEMENT if EXPR;

I prefer to always to use ()' around the expression, since it makes it 
easier to convert between the two forms.

Regards,
/Karl Hammar



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex

2024-01-22 Thread Mike


Why is my Perl not working on that command?

$ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
Unescaped left brace in regex is illegal here in regex; marked by <-- 
HERE in m/a{ <-- HERE ,2}/ at -e line 1.

$

But this works:
$ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'
$

$ echo $?
10
$

It sure surprised me that the first one did not work for me.

Do I need to upgrade my Perl?
$ perl -v

This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux
(with 1 registered patch, see perl -V for more detail)
snip
$

I just went through my Perl documentation and none of
it allows {,2}.  Learning Perl Second Edition (July 1997)
says:
"If you leave off the second number, as in /x{5,}/, it means "that many 
or more" (five or more in this case), and if you leave off the comma, as 
in /x{5}/, it means "exactly this many" (five x's).

To get five or less x's, you must put the zero in, as in /x{0,5}/."


Mike


On 1/22/24 06:23, Jorge Almeida wrote:

Please help me to understand this:
$ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
$ echo $?
$ 10

Thanks

Jorge Almeida




Re: regex

2024-01-22 Thread armando perez pena
Hi,

Sometimes the large path is the shortest one. Go through the tutorial in Perl 
for regular expressions and you will solve your questions and you will learn a 
lot.

About regular expressions are two points of view. First one says that you must 
learn and use it.

The other point of is: if you have a problem and you say I will solve it with 
regular expressions then you have two problems.

Ánimos!
Saludos

From: Claude Brown via beginners 
Sent: Monday, January 22, 2024 10:49:50 PM
To: k...@aspodata.se ; beginners@perl.org 
Subject: RE: regex

Jorge,

Expanding on Karl's answer (and somewhat labouring his point) consider these 
examples:

$a =~ /Jorge/
$a =~ /^Jorge/
$a =~ /Jorge$/
$a =~ /^Jorge$/

This shows that regex providing four different capabilities:
- detect "Jorge" anywhere in the string
- detect "Jorge" at the start of a string (by adding ^)
- detect "Jorge" at the end of a string (by adding $)
- detect that the string is exactly "Jorge" (both ^ and $)

Replace "Jorge" with your pattern, and the result is the same.

Cheers,

Claude.





--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RE: regex

2024-01-22 Thread Claude Brown via beginners
Jorge,

Expanding on Karl's answer (and somewhat labouring his point) consider these 
examples:

$a =~ /Jorge/
$a =~ /^Jorge/
$a =~ /Jorge$/
$a =~ /^Jorge$/

This shows that regex providing four different capabilities:
- detect "Jorge" anywhere in the string
- detect "Jorge" at the start of a string (by adding ^)
- detect "Jorge" at the end of a string (by adding $)
- detect that the string is exactly "Jorge" (both ^ and $)

Replace "Jorge" with your pattern, and the result is the same.

Cheers,

Claude.





--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex

2024-01-22 Thread Levi Elias Nystad-Johansen via beginners
I agree that this is confusing, and I think many resources describing regex in 
unhelpful ways is partly to blame.
descriptions like "pattern that matches against a string" and similar.

this implies that a regex has to match the string, but this is not the case.
a regex does not have to match the string, instead the string has to satisfy 
the regex.

"aaa" satisfies /a{,2}/ because it contains everything the regex requires.
thinking of regex in this way has been a help to me atleast 

-L

 Original Message 
On 22. jan. 2024, 13:23, Jorge Almeida wrote:

> Please help me to understand this:
> $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
> $ echo $?
> $ 10
>
> Thanks
>
> Jorge Almeida

Re: regex

2024-01-22 Thread Andy Bach
Yes, the {}l RE modifier has the canonical form
{a,b} where a and b are numbers and so that modifies the char before it to
match from a to b times, e,g
A{1,3}

matches one, two or three As.  If you leave out the first number, zero is
presumed. Hmm, perl 5.30
% perl -E 's ay(10) if "aaa"=~/a{,2}/;'
Unescaped left brace in regex is illegal here in regex; marked by <-- HERE
in m/a{ <-- HERE ,2}/ at -e line 1.

and
% perldoc perlre

says
   Quantifiers
Quantifiers are used when a particular portion of a pattern needs to
match a certain number (or numbers) of times. If there isn't a
quantifier the number of times to match is exactly one. The following
standard quantifiers are recognized:

*   Match 0 or more times
+   Match 1 or more times
?   Match 1 or 0 times
{n} Match exactly n times
{n,}Match at least n times
{n,m}   Match at least n but not more than m times

(If a non-escaped curly bracket occurs in a context other than one of
the quantifiers listed above, where it does not form part of a
backslashed sequence like "\x{...}", it is either a fatal syntax error,
or treated as a regular character, generally with a deprecation warning
raised. To escape it, you can precede it with a backslash ("\{") or
enclose it within square brackets ("[{]"). This change will allow for
future syntax extensions (like making the lower bound of a quantifier
optional), and better error checking of quantifiers).

On Mon, Jan 22, 2024 at 6:59 AM  wrote:

> Jorge Almeida:
> > Please help me to understand this:
> > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
> > $ echo $?
> > $ 10
>
> In man perlre, under "Regular Expressions" it says:
>
>   {,n}Match at most n times
>
> So /a{,2}/ matches "", "a", and "aa" and is ignorant about what
> comes before and after (basically). That "aa" is followed by a
> "a" isn't something the expression prohibits. If you want that
> try /^a{,2}$/ instead.
>
> Regards,
> /Karl Hammar
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk


Re: regex

2024-01-22 Thread karl
Jorge Almeida:
> On Mon, 22 Jan 2024 at 13:00,  wrote:
> > Jorge Almeida:
> > > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
...
> >   {,n}Match at most n times
...
> Yes, I read it (several times). I still don't understand it (I understand
> what you're saying, and I trust you're right, I just don't understand how
> this behaviour matches the description above--- "at most", really?)

Just think it like this:
 on the table there is three diamonds,
 can you find zero, one, or preferable two diamonds there ?
...
> Now, in
> perl -e 'print $1,"\n" if "aaa"=~/(a{,2})/;'
> $ aa
> this is understandable. More or less. Maybe the semantics of /a{,2}/ should
> be described as "match any number of consecutive 'a' whatsoever and capture
> at most 2  'a' characters...

No, it just looks at the first two a's and finds a match, there is 
still one "a" left, but who cares, you have already got your match.

Regards,
/Karl Hammar



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex

2024-01-22 Thread karl
Jorge Almeida:
> Please help me to understand this:
> $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'
> $ echo $?
> $ 10

In man perlre, under "Regular Expressions" it says:

  {,n}Match at most n times

So /a{,2}/ matches "", "a", and "aa" and is ignorant about what 
comes before and after (basically). That "aa" is followed by a
"a" isn't something the expression prohibits. If you want that
try /^a{,2}$/ instead.

Regards,
/Karl Hammar



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex to detect natural language fragment

2021-09-14 Thread Julius Hamilton
 Thanks very much.

@Chankey Pathak, which of those libraries does you recommend for this task?

Best regards,
Julius

On Tue, Sep 14, 2021 at 2:33 AM Ken Peng  wrote:

> Or use GPT-3 who has a free online API.
> https://openai.com/blog/openai-api/
>
> regards
>
> On Mon, Sep 13, 2021 at 11:42 PM Chankey Pathak 
> wrote:
>
>> You can look into NLP https://metacpan.org/search?q=nlp
>>
>> On Mon, 13 Sept 2021 at 21:04, Julius Hamilton <
>> juliushamilton...@gmail.com> wrote:
>>
>>> Hey,
>>>
>>> I'm not sure if this is possible, and if it's not, I'll explore a better
>>> way to do this.
>>>
>>> I would like to write a script which analyzes if a line of text is
>>> (likely) a broken natural language sentence, i.e., it is probably part of a
>>> sentence, even if the start or end is not present, rather than it being a
>>> fully "complete" linguistic entity, for example, a header of a section,
>>> which does not have a period at the end and is not really a sentence, yet
>>> is in a complete and unbroken form.
>>>
>>> I'm pretty sure in principle this will require some kind of syntax
>>> parsing. I think I read somewhere regular expressions for some mathematical
>>> reason cannot parse tree / nested structures, for example HTML.
>>>
>>> Does anyone know what some next most ubiquitous, standard tool is for
>>> analyzing nested linguistic structures? Is that an XML parser?
>>>
>>> Thanks very much,
>>> Julius
>>>
>>


Re: Regex to detect natural language fragment

2021-09-13 Thread Ken Peng
Or use GPT-3 who has a free online API.
https://openai.com/blog/openai-api/

regards

On Mon, Sep 13, 2021 at 11:42 PM Chankey Pathak 
wrote:

> You can look into NLP https://metacpan.org/search?q=nlp
>
> On Mon, 13 Sept 2021 at 21:04, Julius Hamilton <
> juliushamilton...@gmail.com> wrote:
>
>> Hey,
>>
>> I'm not sure if this is possible, and if it's not, I'll explore a better
>> way to do this.
>>
>> I would like to write a script which analyzes if a line of text is
>> (likely) a broken natural language sentence, i.e., it is probably part of a
>> sentence, even if the start or end is not present, rather than it being a
>> fully "complete" linguistic entity, for example, a header of a section,
>> which does not have a period at the end and is not really a sentence, yet
>> is in a complete and unbroken form.
>>
>> I'm pretty sure in principle this will require some kind of syntax
>> parsing. I think I read somewhere regular expressions for some mathematical
>> reason cannot parse tree / nested structures, for example HTML.
>>
>> Does anyone know what some next most ubiquitous, standard tool is for
>> analyzing nested linguistic structures? Is that an XML parser?
>>
>> Thanks very much,
>> Julius
>>
>


Re: Regex to detect natural language fragment

2021-09-13 Thread Chankey Pathak
You can look into NLP https://metacpan.org/search?q=nlp

On Mon, 13 Sept 2021 at 21:04, Julius Hamilton 
wrote:

> Hey,
>
> I'm not sure if this is possible, and if it's not, I'll explore a better
> way to do this.
>
> I would like to write a script which analyzes if a line of text is
> (likely) a broken natural language sentence, i.e., it is probably part of a
> sentence, even if the start or end is not present, rather than it being a
> fully "complete" linguistic entity, for example, a header of a section,
> which does not have a period at the end and is not really a sentence, yet
> is in a complete and unbroken form.
>
> I'm pretty sure in principle this will require some kind of syntax
> parsing. I think I read somewhere regular expressions for some mathematical
> reason cannot parse tree / nested structures, for example HTML.
>
> Does anyone know what some next most ubiquitous, standard tool is for
> analyzing nested linguistic structures? Is that an XML parser?
>
> Thanks very much,
> Julius
>


Re: regex help - only one value returned

2020-12-02 Thread Jim Gibson
In your original example:

print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);
print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi);

the interior parentheses in example one terminates the alternation, so the last 
string is ’sir’.

In example two, the alternation is not terminated until the first ‘)', so the 
last string is ’sir .{5,}?’. followed in the regular expression by the “\n” 
character. Since in $T ‘miss’ is not followed by an \n, the match fails. Vlado 
has explained how to group and terminate the alternation without capturing the 
match result.


> On Dec 2, 2020, at 6:08 AM, Gary Stainburn  
> wrote:
> 
> On 02/12/2020 13:56, Vlado Keselj wrote:
>> Well, it seems that the first one is what you want, but you just need to
>> use $1 and ignore $2.
>> 
>> You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not
>> want for them to be captured in $2, you can use:
>> '(?:mr|mrs|miss|dr|prof|sir)'.  For example:
>> 
>> print "match3='$1' '$2'\n" if
>> ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);
>> 
>> would give output:
>> 
>> match3='Miss Jayne Doe' ''
> Perfect, thank you.
> 
> I can't ignore $2 as it's in a loop with other regex that genuinely returns 
> multiple matches.  The amendment to the REGEX worked perfectly.

It is always best to save the results of a match with capturing in another 
variable. The capturing variables $1, $2, etc. are not reassigned if a match 
fails, so if you use them after a failed match, they will be the values left 
over from a previous match. So do this:

my $salutation = $1;
my $name = $2;

If you don’t want a possible undefined value, so this instead:

my $name = $2 || '';


Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex help - only one value returned

2020-12-02 Thread Gary Stainburn

On 02/12/2020 13:56, Vlado Keselj wrote:

Well, it seems that the first one is what you want, but you just need to
use $1 and ignore $2.

You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not
want for them to be captured in $2, you can use:
'(?:mr|mrs|miss|dr|prof|sir)'.  For example:

print "match3='$1' '$2'\n" if
($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);

would give output:

match3='Miss Jayne Doe' ''

Perfect, thank you.

I can't ignore $2 as it's in a loop with other regex that genuinely 
returns multiple matches.  The amendment to the REGEX worked perfectly.


Gary

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex help - only one value returned

2020-12-02 Thread Vlado Keselj


Well, it seems that the first one is what you want, but you just need to 
use $1 and ignore $2.

You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not 
want for them to be captured in $2, you can use:
'(?:mr|mrs|miss|dr|prof|sir)'.  For example:

print "match3='$1' '$2'\n" if
($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi);

would give output:

match3='Miss Jayne Doe' ''

On Wed, 2 Dec 2020, Gary Stainburn wrote:

> I have an array of regex expressions that I apply to text returned from
> tesseract.
> 
> Each match that I get then gets stored for future processing. However, I'm
> struggling with one regex.
> 
> The problem is that:
> 
> 1) with brackets round the titles it returns two matches.
> 2) without brackets, it returns nothing.
> 
> Can anyone point me at the correct syntax please.
> 
> Gary
> 
> [root@dev dev]# ./t
> match1='Miss Jayne Doe' 'Miss'
> [root@dev dev]# cat t
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> 
> my $T=< Customer name and address
> Miss Jayne Doe
> 19 Their Street
> Somewehere
> In Yorkshire
> IN1 3YY
> EOF
> 
> print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir)
> .{5,}?)\n/smi);
> print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi);
> [root@dev dev]#
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for date

2018-08-25 Thread Chris Charley

"Asad"  wrote in message 
news:cag3lskh4dphjg18c-jxmo8bcqfd+vix5tep1ytsp4_6pd6z...@mail.gmail.com...
Hi All , 

  I need  a regex to match the date : Sat Aug 25 08:41:03 2018 and 
covert into a format :'%m/%d/%Y %H:%M:%S' 

Thanks, 


-- 

Asad Hasan
+91 9582111698

Hello Asad,

You could use Time::Piece to do this. (Although given a choice, I would use 
‘%Y/%m/%d %H:%M:%S’ which sorts naturally in a sorting situation)

#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;

my $d = 'Sat Aug 25 08:41:03 2018';

my $dt = Time::Piece->strptime($d, '%a %b %d %H:%M:%S %Y');

say $dt->strftime('%m/%d/%Y %H:%M:%S');

Re: Regex for date

2018-08-25 Thread Jim Gibson
Many Perl modules have been written to parse and manipulate dates and times. 
Some come with Perl; others are available at www.cpan.org.

Check out the Date::Manip, Date::Parse, or DateTime modules.

> On Aug 25, 2018, at 4:06 AM, Asad  wrote:
> 
> Hi All ,
> 
> I need  a regex to match the date : Sat Aug 25 08:41:03 2018 and 
> covert into a format : '%m/%d/%Y %H:%M:%S' 
> 
> Thanks, 
> 
> -- 
> Asad Hasan
> +91 9582111698



Jim Gibson

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for date

2018-08-25 Thread Asad
Thanks, I'll check them out.

On Sat, Aug 25, 2018 at 4:53 PM Home Linux Info 
wrote:

>
> Hello,
>
> Maybe not the most beautiful regex out there, hey I'm a noob, but it does
> the job right:
> ([A-Z][a-z]{2}\s)|([0-9]{2}\s[0-2][0-9](:[0-5][0-9]){2}\s[0-9]{4})
> You can start from here and find a nicer form of this regex.
>
> On 8/25/18 2:06 PM, Asad wrote:
>
> Hi All ,
>
>   I need  a regex to match the date : Sat Aug 25 08:41:03 2018 and
> covert into a format : '%m/%d/%Y %H:%M:%S'
>
> Thanks,
>
> --
> Asad Hasan
> +91 9582111698
>
>
>

-- 
Asad Hasan
+91 9582111698


Re: Regex for date

2018-08-25 Thread Home Linux Info


Hello,

Maybe not the most beautiful regex out there, hey I'm a noob, but it 
does the job right:

([A-Z][a-z]{2}\s)|([0-9]{2}\s[0-2][0-9](:[0-5][0-9]){2}\s[0-9]{4})
You can start from here and find a nicer form of this regex.

On 8/25/18 2:06 PM, Asad wrote:

Hi All ,

          I need  a regex to match the date : Sat Aug 25 08:41:03 2018 
and covert into a format :  '%m/%d/%Y %H:%M:%S'


Thanks,

--
Asad Hasan
+91 9582111698




Re: Regex for date

2018-08-25 Thread Mike Flannigan


Really, no attempt to do it yourself?


Mike


On 8/25/2018 6:06 AM, beginners-digest-h...@perl.org wrote:


Hi All ,

          I need  a regex to match the date : Sat Aug 25 08:41:03 2018 
and covert into a format :  '%m/%d/%Y %H:%M:%S'


Thanks,

--
Asad Hasan




Re: regex to get the rpm name version

2018-08-09 Thread Andy Bach
You can put your separators in there as literals to keep them out of
captures:

$ cat /tmp/ver.pl
#!perl

while () {
  if ( /([\w+-]{3,})-([.\d-]+)\./ ) {
 print "$1 - $2\n";
  }
print "$_\n";
}


__END__
binutils-2.23.52.0.1-12.el7.x86_64
compat-libcap1-1.10-3.el7.x86_64
compat-libstdc++-33-3.2.3-71.el7.i686

$ perl /tmp/ver.pl
binutils - 2.23.52.0.1-12
binutils-2.23.52.0.1-12.el7.x86_64

compat-libcap1 - 1.10-3
compat-libcap1-1.10-3.el7.x86_64

compat-libstdc++-33 - 3.2.3-71
compat-libstdc++-33-3.2.3-71.el7.i686

But you may want to look at the options for rpm listing. There are many and
they can specifically list the package version - you can create your own
format for the listings.

man rpm
   --qf|--queryformat QUERYFMT

   option, followed by the QUERYFMT format string.  Query formats are
modified versions of the standard printf(3) formatting. The format  is
   made up of static strings (which may include standard C character
escapes for newlines, tabs, and other special characters) and printf(3)
   type formatters.  As rpm already knows the type to print, the type
specifier must be omitted however, and replaced by  the  name  of  the
   header  tag to be printed, enclosed by {} characters. Tag names are
case insensitive, and the leading RPMTAG_ portion of the tag name may
   be omitted as well.


On Thu, Aug 9, 2018 at 4:32 PM, Home Linux Info 
wrote:

>
> Hello,
>
> You can begin with "*[a-zA-Z_+-]{3,}[0-9]*" to get the package name, it
> needs a little more work for right now it gets the last dash and first
> digit of package version. Then you can try "*([^a-zA-Z_+-]{3,})(.\d{1,})*
> ".
> The first regex gives the following result:
> *binutils-2*
> *compat-libcap1*
> *compat-libstdc++-3*
> Which is almost what you need while the second one is more exact as it
> gives you:
> *2.23.52.0.1-12*
> *1.10-3*
> *3.2.3-71*
> And that looks like exactly what you need.
>
> I'm no expert in regex but I like to experiment with it to see if I can
> extract some parts from a text / string using it.
>
> Jimmy (bash, perl and python total noob but trying to learn stuff).
>
> On 27.07.2018 15:54, Asad wrote:
>
> Hi All ,
>
>  I want to get a regex to actually get the rpm name and version
> for comparison :
>
>
> binutils-2.23.52.0.1-12.el7.x86_64",
> compat-libcap1-1.10-3.el7.x86_64"
> compat-libstdc++-33-3.2.3-71.el7.i686
>
> (^[a-zA-Z0-9\-]*)\-\d'
>
> First part of the regular expression is ^[a-zA-Z0-9\-]
>
> which means search for anything that begins with a letter
>
> (lower or upper) or a number up until you reach an
>
> hyphen sign (‘-‘).
>
> but it fails to match
>
> compat-libstdc++-33-3.2.3-71.el7.i686
>
> Please let me know what regex should i use to extract all 3
>
> rpms.
>
> Also let me know if there are web tools to build regex
>
> Good websites for regex tutorials.
>
> Thanks,
>
>
>
> --
> Asad Hasan
> +91 9582111698
>
>
>


-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk


Re: regex to get the rpm name version

2018-08-09 Thread Home Linux Info


Hello,

You can begin with "*[a-zA-Z_+-]{3,}[0-9]*" to get the package name, it 
needs a little more work for right now it gets the last dash and first 
digit of package version. Then you can try "*([^a-zA-Z_+-]{3,})(.\d{1,})*".

The first regex gives the following result:
/binutils-2//
//compat-libcap1//
//compat-libstdc++-3//
/Which is almost what you need while the second one is more exact as it 
gives you:

/2.23.52.0.1-12//
//1.10-3//
//3.2.3-71//
/And that looks like exactly what you need.

I'm no expert in regex but I like to experiment with it to see if I can 
extract some parts from a text / string using it.


Jimmy (bash, perl and python total noob but trying to learn stuff).

On 27.07.2018 15:54, Asad wrote:

Hi All ,

         I want to get a regex to actually get the rpm name and 
version for comparison :



binutils-2.23.52.0.1-12.el7.x86_64", compat-libcap1-1.10-3.el7.x86_64" 
compat-libstdc++-33-3.2.3-71.el7.i686 (^[a-zA-Z0-9\-]*)\-\d'

First part of the regular expression is ^[a-zA-Z0-9\-]
which means search for anything that begins with a letter
(lower or upper) or a number up until you reach an
hyphen sign (‘-‘).
but it fails to match
compat-libstdc++-33-3.2.3-71.el7.i686
Please let me know what regex should i use to extract all 3
rpms.
Also let me know if there are web tools to build regex
Good websites for regex tutorials.
Thanks,


--
Asad Hasan
+91 9582111698




Re: regex to get the rpm name version

2018-07-27 Thread Shlomi Fish
Hi Asad,

On Fri, 27 Jul 2018 18:24:39 +0530
Asad  wrote:

> Hi All ,
> 
>  I want to get a regex to actually get the rpm name and version for
> comparison :
> 
> 
> binutils-2.23.52.0.1-12.el7.x86_64",
> compat-libcap1-1.10-3.el7.x86_64"
> compat-libstdc++-33-3.2.3-71.el7.i686
> 
> (^[a-zA-Z0-9\-]*)\-\d'
> 
> First part of the regular expression is ^[a-zA-Z0-9\-]
> 
> which means search for anything that begins with a letter
> 
> (lower or upper) or a number up until you reach an
> 
> hyphen sign (‘-‘).
> 
> but it fails to match
> 
> compat-libstdc++-33-3.2.3-71.el7.i686
> 
> Please let me know what regex should i use to extract all 3
> 
> rpms.
> 
> Also let me know if there are web tools to build regex
> 
> Good websites for regex tutorials.
> 

for that, see:

* http://perl-begin.org/topics/regular-expressions/

* https://github.com/aloisdg/awesome-regex

* https://www.regular-expressions.info/


> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
https://youtu.be/GoEn1YfYTBM - Tiffany Alvord - “Fall Together”

C++ is complex, complexifying and complexified.
(With apologies to the Oxford English Dictionary).
— http://www.shlomifish.org/humour.html

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex to get the rpm name version

2018-07-27 Thread Chas. Owens
But if you have to use a regex, I suggest using the /x  modifier to make it
easier to read an maintain the regex:

#!/usr/bin/perl

use strict;
use warnings;

for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64
compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) {
my ($name, $version, $build) = $s =~ m{
^ (.*) # name
- (.*) # version
- ([0-9]+) # build
[.] [^.]+  # os
[.] [^.]+ \z  # architecture
}x;
print "n $name v $version b $build\n";
}

On Fri, Jul 27, 2018 at 9:14 AM Chas. Owens  wrote:

> I don't think a regex is the simplest and most maintainable way to get
> this information.  I think it is probably better to take advantage of the
> structure of the string to discard and find information:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64
> compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) {
> my @dots = split /\,/, $s;
> pop @dots; #get rid of architecture
> pop @dots; #get rid of os
> my $name_and_version = join "", @dots;
> my @dashes = split /-/, $s;
> my $build = pop @dashes;
> my $version = pop @dashes;
> my $name = join "-", @dashes;
> print "n $name v $version b $build\n";
> }
>
>
>
> On Fri, Jul 27, 2018 at 8:57 AM Asad  wrote:
>
>> Hi All ,
>>
>>  I want to get a regex to actually get the rpm name and version
>> for comparison :
>>
>>
>> binutils-2.23.52.0.1-12.el7.x86_64",
>> compat-libcap1-1.10-3.el7.x86_64"
>> compat-libstdc++-33-3.2.3-71.el7.i686
>>
>> (^[a-zA-Z0-9\-]*)\-\d'
>>
>> First part of the regular expression is ^[a-zA-Z0-9\-]
>>
>> which means search for anything that begins with a letter
>>
>> (lower or upper) or a number up until you reach an
>>
>> hyphen sign (‘-‘).
>>
>> but it fails to match
>>
>> compat-libstdc++-33-3.2.3-71.el7.i686
>>
>> Please let me know what regex should i use to extract all 3
>>
>> rpms.
>>
>> Also let me know if there are web tools to build regex
>>
>> Good websites for regex tutorials.
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>>
>>
>> --
>> Asad Hasan
>> +91 9582111698 <+91%2095821%2011698>
>>
>


RE: regex to get the rpm name version

2018-07-27 Thread Duncan Ferguson
I would suggest you change your approach and user the query mode of RPM to get 
your information instead of build up a regexp:

rpm -qa --queryformat "%{NAME}\n"

  Duncs

From: Asad [mailto:asad.hasan2...@gmail.com]
Sent: 27 July 2018 13:55
To: beginners@perl.org
Subject: regex to get the rpm name version

Hi All ,

 I want to get a regex to actually get the rpm name and version for 
comparison :



binutils-2.23.52.0.1-12.el7.x86_64",

compat-libcap1-1.10-3.el7.x86_64"

compat-libstdc++-33-3.2.3-71.el7.i686



(^[a-zA-Z0-9\-]*)\-\d'

First part of the regular expression is ^[a-zA-Z0-9\-]

which means search for anything that begins with a letter

(lower or upper) or a number up until you reach an

hyphen sign (‘-‘).



but it fails to match

compat-libstdc++-33-3.2.3-71.el7.i686

Please let me know what regex should i use to extract all 3

rpms.

Also let me know if there are web tools to build regex

Good websites for regex tutorials.







Thanks,









--
Asad Hasan
+91 9582111698


Re: regex to get the rpm name version

2018-07-27 Thread Chas. Owens
I don't think a regex is the simplest and most maintainable way to get this
information.  I think it is probably better to take advantage of the
structure of the string to discard and find information:

#!/usr/bin/perl

use strict;
use warnings;

for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64
compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) {
my @dots = split /\,/, $s;
pop @dots; #get rid of architecture
pop @dots; #get rid of os
my $name_and_version = join "", @dots;
my @dashes = split /-/, $s;
my $build = pop @dashes;
my $version = pop @dashes;
my $name = join "-", @dashes;
print "n $name v $version b $build\n";
}



On Fri, Jul 27, 2018 at 8:57 AM Asad  wrote:

> Hi All ,
>
>  I want to get a regex to actually get the rpm name and version
> for comparison :
>
>
> binutils-2.23.52.0.1-12.el7.x86_64",
> compat-libcap1-1.10-3.el7.x86_64"
> compat-libstdc++-33-3.2.3-71.el7.i686
>
> (^[a-zA-Z0-9\-]*)\-\d'
>
> First part of the regular expression is ^[a-zA-Z0-9\-]
>
> which means search for anything that begins with a letter
>
> (lower or upper) or a number up until you reach an
>
> hyphen sign (‘-‘).
>
> but it fails to match
>
> compat-libstdc++-33-3.2.3-71.el7.i686
>
> Please let me know what regex should i use to extract all 3
>
> rpms.
>
> Also let me know if there are web tools to build regex
>
> Good websites for regex tutorials.
>
>
>
>
> Thanks,
>
>
>
>
>
> --
> Asad Hasan
> +91 9582111698 <+91%2095821%2011698>
>


Re: regex matches Chinese characters

2018-07-26 Thread Shlomi Fish
Hi Lauren,

On Fri, 27 Jul 2018 11:28:42 +0800
"Lauren C."  wrote:

> greetings,
> 
> I was doing the log statistics stuff using perl.
> There are chinese characters in log items.
> I tried with regex to match them, but got no luck.
> 
> $ perl -mstrict  -le 'my $char="汉语"; print "it is chinese" if $char =~ 
> /\p{Han}+/'
> 
> $ perl -mstrict -mutf8 -le 'my $char="汉语"; print "it is chinese" if 
> $char =~ /\p{Han}+/'
> 
> both output nothing.
> 
> My terminal is UTF-8:
> 

According to http://perldoc.perl.org/perlrun.html , you probably need -Mstrict
and -Mutf8 instead of the lowercase -m, so "sub import" will get called:

shlomif@telaviv1:~$ perl -Mstrict -Mutf8 -le 'my $char="汉语"; print "it is
chinese" if $char =~ /\p{Han}+/'
it is chinese
shlomif@telaviv1:~$ 

HTH,

Shlomi

> $ locale
> LANG=en_US.UTF-8
> LANGUAGE=
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
> 
> 
> Can you help? thanks in advance.
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
https://github.com/sindresorhus/awesome - curated list of lists

Cats are smarter than dogs. You can’t get eight cats to pull a sled through
snow.— Source unknown, via Nadav Har’El.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex matches Chinese characters

2018-07-26 Thread Lauren C.

oops that's perfect. thanks Shlomi.

On 2018/7/27 星期五 PM 1:26, Shlomi Fish wrote:

Hi Lauren,

On Fri, 27 Jul 2018 11:28:42 +0800
"Lauren C."  wrote:


greetings,

I was doing the log statistics stuff using perl.
There are chinese characters in log items.
I tried with regex to match them, but got no luck.

$ perl -mstrict  -le 'my $char="汉语"; print "it is chinese" if $char =~
/\p{Han}+/'

$ perl -mstrict -mutf8 -le 'my $char="汉语"; print "it is chinese" if
$char =~ /\p{Han}+/'

both output nothing.

My terminal is UTF-8:



According to http://perldoc.perl.org/perlrun.html , you probably need -Mstrict
and -Mutf8 instead of the lowercase -m, so "sub import" will get called:

shlomif@telaviv1:~$ perl -Mstrict -Mutf8 -le 'my $char="汉语"; print "it is
chinese" if $char =~ /\p{Han}+/'
it is chinese
shlomif@telaviv1:~$

HTH,

Shlomi


$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


Can you help? thanks in advance.







--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for date format

2018-06-29 Thread Mike Martin
Worked perfectly thanks, uri, and same technique works perfectly in
postgresql regexp_replace for info

On 29 June 2018 at 16:18, Mike Martin  wrote:

> Thanks
>
>
> On Fri, 29 Jun 2018, 15:48 Uri Guttman,  wrote:
>
>> On 06/29/2018 10:41 AM, Mike Martin wrote:
>>
>> sorry
>> -mm-dd hh:mm:ss.dd
>> eg:
>> 2018-01-01 12-45-10-456789 to
>> 2018-01-01 12:45:10.456789
>>
>>
>>
>> please reply to the list and not to me!
>>
>> then why did you want lookbehind? this is very easy if you just grab the
>> time parts and reassemble them as you want. 
>>
>> $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ;
>>
>> it uses the space to mark where the time part starts.
>>
>> uri
>>
>>
>>


Re: Regex for date format

2018-06-29 Thread Mike Martin
Thanks

On Fri, 29 Jun 2018, 15:48 Uri Guttman,  wrote:

> On 06/29/2018 10:41 AM, Mike Martin wrote:
>
> sorry
> -mm-dd hh:mm:ss.dd
> eg:
> 2018-01-01 12-45-10-456789 to
> 2018-01-01 12:45:10.456789
>
>
>
> please reply to the list and not to me!
>
> then why did you want lookbehind? this is very easy if you just grab the
> time parts and reassemble them as you want. 
>
> $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ;
>
> it uses the space to mark where the time part starts.
>
> uri
>
>
>


Re: Regex for date format

2018-06-29 Thread Uri Guttman

On 06/29/2018 10:41 AM, Mike Martin wrote:

sorry
-mm-dd hh:mm:ss.dd
eg:
2018-01-01 12-45-10-456789 to
2018-01-01 12:45:10.456789




please reply to the list and not to me!

then why did you want lookbehind? this is very easy if you just grab the 
time parts and reassemble them as you want. 


    $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ;

it uses the space to mark where the time part starts.

uri




Re: Regex for date format

2018-06-29 Thread Uri Guttman

On 06/29/2018 09:32 AM, Mike Martin wrote:

Hi
I am trying to convert a string of the format
2018-01-01 16-45-21-654278

to a proper timestamp string

so basically I want to replace all -  after the date part


i am not sure what you are trying to do. show the after text that you 
want. a proper timestamp string is not specific enough.


if you want to really parse that string, then use Time::Piece and its 
strptime sub which can parse pretty much any time/date string.


uri

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex with HEX ascii chars

2018-04-15 Thread Mike Flannigan


Try:
binmode(HANDLE)
before reading the file.
HANDLE is your filehandle.


If that doesn't work you might want to supply the
text file and a sample script.


Mike


On 4/12/2018 12:04 PM, beginners-digest-h...@perl.org wrote:


I have a text file (created by  pdftotext) that I've imported into my script.

It contains ASCII characters 251 for crosses and 252 for ticks.  If I load the
file in gvim and do :as

it reports the characters as

 251, Hex 00fb, Octal 373
 252, hex 00fc, Octal 374

However, when I try to seacch for it using

if ($line=~/[\xfb|\xfc]/) {

or even just

if ($line=~/\xfb/) {

it always fails.  What am I doing wrong?

Gary




Re: regex with HEX ascii chars

2018-04-13 Thread John W. Krahn
On Thu, 2018-04-12 at 17:26 +0100, Gary Stainburn wrote:
> I have a text file (created by  pdftotext) that I've imported into my
> script.
> 
> It contains ASCII characters 251 for crosses and 252 for ticks.

ASCII defines 128 characters so those characters are not ASCII.


John

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex with HEX ascii chars

2018-04-13 Thread Gary Stainburn
On Thursday 12 April 2018 19:53:16 Shlomi Fish wrote:
> Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read
> the file as binary or iso8859-1 or whatever. Also see

Thanks for this Shlomi. I have looked into that before briefly when doing http 
gets and reading office documents, but this time I didn't think I was going 
to need this.

> https://github.com/shlomif/how-to-share-code-online and read what Andy
> noted.

I thought the problem with my concepts rather than the program itself.  The 
following code shows that I was wrong.

#!/usr/bin/perl 

use strict;
use warnings;

my $line="A û ü  û";
my @arr=($line=~/(\xc3.)/g);
my $tick="\xc3\xbc";
my $cross="\xc3\xbb";

foreach my $c (split //,$line) {
  printf "%s = %X %d\n",$c,ord($c),ord($c);
}
if ($line=~/\xc3\xbb/) { print "true\n";}
foreach my $a (@arr) {
  print "start\n";
  if ($a eq $tick)  { print "tick\n";}
  if ($a eq $cross) { print "cross\n";}
}

[root@lou inet]# ./t1
A = 41 65
  = 20 32
� = C3 195
� = BB 187
  = 20 32
� = C3 195
� = BC 188
  = 20 32
  = 20 32
� = C3 195
� = BB 187
true
start
cross
start
tick
start
cross
[root@lou inet]# 

When I went back to gvim I noticed that it started showing two column values 
as as go past these fields, which should have given me a clue.

My production code now includes the following working code:

my $tick="\xc3\xbc";
my $cross="\xc3\xbb";

my @ticks=($line=~/(\xc3.)/g);
if (scalar(@ticks) == 5) {
  if ($ticks[0] eq $tick) {$job{sj_mot}='true';}
  if ($ticks[1] eq $tick) {$wuw='true'; $job{sj_wait}=20;}
  if ($ticks[2] eq $tick) {$job{sj_c_car}='true';}
  # 3 = advisor which we don't use
  if ($ticks[4] eq $tick) {$job{sj_wait}=30;}
} else {
  debugprint(1,"incorrect tick/cross count returned");
}

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex with HEX ascii chars

2018-04-12 Thread Shlomi Fish
On Thu, 12 Apr 2018 17:26:57 +0100
Gary Stainburn  wrote:

> I have a text file (created by  pdftotext) that I've imported into my script.
> 
> It contains ASCII characters 251 for crosses and 252 for ticks.  If I load
> the file in gvim and do :as
> 
> it reports the characters as 
> 
>  251, Hex 00fb, Octal 373
>  252, hex 00fc, Octal 374
> 
> However, when I try to seacch for it using
> 
> if ($line=~/[\xfb|\xfc]/) {
> 
> or even just 
> 
> if ($line=~/\xfb/) { 
> 
> it always fails.  What am I doing wrong?
> 

Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read the
file as binary or iso8859-1 or whatever. Also see
https://github.com/shlomif/how-to-share-code-online and read what Andy noted.

> Gary
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
https://github.com/shlomif/what-you-should-know-about-automated-testing

It’s easier to port a shell than a shell script.
— http://en.wikiquote.org/wiki/Larry_Wall

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex with HEX ascii chars

2018-04-12 Thread Andy Bach
> However, when I try to seacch for it using

if ($line=~/[\xfb|\xfc]/) {

Note, you're mixing the character class " [ab] " with grouping alternative
pipe "  (  a | b ) " here

> or even just

if ($line=~/\xfb/) {

Dunno, works here:
$ perl -e '$line = "hi" . chr 251 . "ho" . chr 252 ; if
($line=~/[\xfb\xfc]/) { print "yep" } print "\n"'
yep
$ perl -e '$line = "hi" . chr 250 . "ho" . chr 253 ; if
($line=~/[\xfb\xfc]/) { print "yep" } print "\n"'
[crickets]


So, I'd guess your $line doesn't have a \xfb or \xfc in it at the time of
the test.
$ perl -e '$line = "hi" . chr 251 . "ho" . chr 253 ; if
($line=~/([\xfb\xfc])/) { print "yep: $1" } print "\n"' | od -c
000   y   e   p   : 373  \n
007


On Thu, Apr 12, 2018 at 11:26 AM, Gary Stainburn <
gary.stainb...@ringways.co.uk> wrote:

> I have a text file (created by  pdftotext) that I've imported into my
> script.
>
> It contains ASCII characters 251 for crosses and 252 for ticks.  If I load
> the
> file in gvim and do :as
>
> it reports the characters as
>
>  251, Hex 00fb, Octal 373
>  252, hex 00fc, Octal 374
>
> However, when I try to seacch for it using
>
> if ($line=~/[\xfb|\xfc]/) {
>
> or even just
>
> if ($line=~/\xfb/) {
>
> it always fails.  What am I doing wrong?
>
> Gary
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Shawn H Corey
On Sat, 05 Nov 2016 21:30:12 +
Aaron Wells  wrote:

> True. It could get hairy. Unicode is a pretty vast landscape, and I
> think if you only want ASCII word characters to count as things that
> could be in a filename, your original [A-Za-z0-9_] is your best bet.
> Thanks to the others for their comments. As Ken says: there are
> probably more ways to code this.

TIMTOWTDI
https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to_do_it

;)

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Octavian Rasnita
From: Aaron Wells 


  True. It could get hairy. Unicode is a pretty vast landscape, and I think if 
you only want ASCII word characters to count as things that could be in a 
filename, your original [A-Za-z0-9_] is your best bet. Thanks to the others for 
their comments. As Ken says: there are probably more ways to code this.




Another (shorter) way of writing that can be:

/^\w+$/aa

Where /aa makes \w mean just [A-Za-z0-9_].

a = ASCII and aa is used for double protection, so only ASCII is used.

--Octavian



Re: Regex for matching files that don't have type extensions

2016-11-05 Thread X Dungeness
On Sat, Nov 5, 2016 at 10:55 AM, Jovan Trujillo
 wrote:
> Hi Aaron,
>In perlre I read that \w
> "
>
> \w[3]  Match a "word" character (alphanumeric plus "_", plus
>   other connector punctuation chars plus
> Unicode
>   marks)
>
> "
>
> So since I didn't know what these 'other' connection punctuation chars are I
> avoided it. Unicode makes things more complicated for me. Do you know?
>

To exclude Unicode and ensure only ASCII, use the /a modifer,
eg,  /\w+/a

>From perlre:

/a

   is the same as "/u", except that "\d", "\s", "\w", and the Posix
   character classes are restricted to matching in the ASCII range only.
   That is, with this modifier, "\d" always means precisely the digits "0"
   to "9"; "\s" means the five characters "[ \f\n\r\t]"; "\w" means the 63
   characters "[A-Za-z0-9_]"; and likewise, all the Posix classes such as
   "[[:print:]]" match only the appropriate ASCII-range characters.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Aaron Wells
True. It could get hairy. Unicode is a pretty vast landscape, and I think
if you only want ASCII word characters to count as things that could be in
a filename, your original [A-Za-z0-9_] is your best bet. Thanks to the
others for their comments. As Ken says: there are probably more ways to
code this.

On Sat, Nov 5, 2016, 11:44 AM Kent Fredric  wrote:

> On 6 November 2016 at 06:14, Jovan Trujillo 
> wrote:
> >
> > 1207003PE_GM_09TNPLM2.csv
> >
> > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> > strings.
> > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> > captured.
>
> Alternatively, if your use case allows it, it might be more viable to
> use negative matching.
>
>   $file !~ /[.]/ and print "$file has no extension"
>
> There's probably a reason why you're not doing this already, but can't
> tell from the context.
>
> NB: Clearly defining what an "extension" means is also pertinent:
>
> fooo.csv
> fooo.jpg
> fooo.jpeg
> foo.tar.xz
> foo.config
> .config
> .config.ini
>
> You probably are just meaning "has a dot" or "has a dot followed by at
> most 3 characters", but its hard to tell from context ( and there are
> a lot of obvious cases where there is an "extension" suffix that is
> greater than 3 characters )
>
>
>
>
>
>
>
> --
> Kent
>
> KENTNL - https://metacpan.org/author/KENTNL
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Kent Fredric
On 6 November 2016 at 06:14, Jovan Trujillo  wrote:
>
> 1207003PE_GM_09TNPLM2.csv
>
> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> strings.
> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> captured.

Alternatively, if your use case allows it, it might be more viable to
use negative matching.

  $file !~ /[.]/ and print "$file has no extension"

There's probably a reason why you're not doing this already, but can't
tell from the context.

NB: Clearly defining what an "extension" means is also pertinent:

fooo.csv
fooo.jpg
fooo.jpeg
foo.tar.xz
foo.config
.config
.config.ini

You probably are just meaning "has a dot" or "has a dot followed by at
most 3 characters", but its hard to tell from context ( and there are
a lot of obvious cases where there is an "extension" suffix that is
greater than 3 characters )







-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Ken Slater
Hi Jovan,

On Sat, Nov 5, 2016 at 1:14 PM, Jovan Trujillo 
wrote:

> Hi All,
> I thought I could use a simple regex to match files like this:
>
> 1207003PE_GM_09TNPLM2
>
> and ignore files with extensions like this:
>
> 1207003PE_GM_09TNPLM2.csv
>
> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> strings.
> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> captured.
>
> What am I doing wrong?
>
> Thank you,
> Jovan
>

The regular expression *m/[A-Za-z0-9\_]+(?!\.)/* will match, as it will
match one or more of the desired characters (*1207003PE_GM_09TNPLM*) that
are followed by a character (*2*) that is not a period/dot.

There are probably many ways to code this. The simplest may be to run two
regular expressions - the first to determine if there is a period/dot (*.*)
in the string.

HTH, Ken


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Jovan Trujillo
Hi Aaron,
   In perlre I read that \w
"

- \w[3]  Match a "word" character (alphanumeric plus "_", plus
-   other connector punctuation
chars plus Unicode
-   marks)

"

So since I didn't know what these 'other' connection punctuation chars are
I avoided it. Unicode makes things more complicated for me. Do you know?


Thanks,

Jovan

On Sat, Nov 5, 2016 at 10:27 AM, Aaron Wells  wrote:

> *predefined
>
> On Sat, Nov 5, 2016, 10:27 AM Aaron Wells  wrote:
>
>> Hi Jovan. \w is a presidents character classes that is equivalent to
>> [A-Za-z0-9_], so this works also:
>> m/^\w+$/
>>
>> On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo 
>> wrote:
>>
>> Ah, I figured it out.
>>  m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string
>> follows the pattern. Thanks!
>>
>> On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo <
>> jovan.trujil...@gmail.com> wrote:
>>
>> Hi All,
>> I thought I could use a simple regex to match files like this:
>>
>> 1207003PE_GM_09TNPLM2
>>
>> and ignore files with extensions like this:
>>
>> 1207003PE_GM_09TNPLM2.csv
>>
>> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
>> strings.
>> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
>> captured.
>>
>> What am I doing wrong?
>>
>> Thank you,
>> Jovan
>>
>>
>>


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Aaron Wells
*predefined

On Sat, Nov 5, 2016, 10:27 AM Aaron Wells  wrote:

> Hi Jovan. \w is a presidents character classes that is equivalent to
> [A-Za-z0-9_], so this works also:
> m/^\w+$/
>
> On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo 
> wrote:
>
> Ah, I figured it out.
>  m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string
> follows the pattern. Thanks!
>
> On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo  > wrote:
>
> Hi All,
> I thought I could use a simple regex to match files like this:
>
> 1207003PE_GM_09TNPLM2
>
> and ignore files with extensions like this:
>
> 1207003PE_GM_09TNPLM2.csv
>
> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> strings.
> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> captured.
>
> What am I doing wrong?
>
> Thank you,
> Jovan
>
>
>


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Aaron Wells
Hi Jovan. \w is a presidents character classes that is equivalent to
[A-Za-z0-9_], so this works also:
m/^\w+$/

On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo 
wrote:

> Ah, I figured it out.
>  m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string
> follows the pattern. Thanks!
>
> On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo  > wrote:
>
> Hi All,
> I thought I could use a simple regex to match files like this:
>
> 1207003PE_GM_09TNPLM2
>
> and ignore files with extensions like this:
>
> 1207003PE_GM_09TNPLM2.csv
>
> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> strings.
> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> captured.
>
> What am I doing wrong?
>
> Thank you,
> Jovan
>
>
>


Re: Regex for matching files that don't have type extensions

2016-11-05 Thread Jovan Trujillo
Ah, I figured it out.
 m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string
follows the pattern. Thanks!

On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo 
wrote:

> Hi All,
> I thought I could use a simple regex to match files like this:
>
> 1207003PE_GM_09TNPLM2
>
> and ignore files with extensions like this:
>
> 1207003PE_GM_09TNPLM2.csv
>
> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both
> strings.
> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings
> captured.
>
> What am I doing wrong?
>
> Thank you,
> Jovan
>


Re: Regex to match "bad" characters in a parameter

2016-01-27 Thread lee
"Chris Charley"  writes:

> You could do that in 1 line - See the following small program.
> (The line using a 'grep' solution is commented out. It would work as well).
>
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> while (my $id = ) {
>chomp $id;
>#if (grep /itemid=.*?[^\w-]/, split /&/, $id) {
>if ($id =~ /itemid/ && $id !~ /itemid=[\w-]+(?:&|$)/) {
>print "Bad id: <$id>\n";
>}
> }
>
> __DATA__
> itemid=AT18C_AT18C=1=main.htm=1=1=detail.htm=asc
> c=detail.htm=AT18C
> itemid=AT18/C
> t=main.htm=1=1=detail.htm=asc

This might be a string with a bad item id because there is none: Are you
going to process the string, assuming that it is a good item id?

How do you determine the beginning of the relevant sequence --- and thus
whether the string contains a good item id or not --- when the string
might not contain 'itemid' to designate the beginning?

I think you might need to work with cleaner definitions, and/or attempt
to find the good item ids instead of the bad ones.

> itemid=?AT18C
>
>
> When this is run, it prints out:
>
> Bad id: 

Re: Regex to match "bad" characters in a parameter

2016-01-26 Thread SSC_perl
On Jan 25, 2016, at 4:59 PM, Shawn H Corey wrote:
> 
> Use the negative match operator !~
> 
>  if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){
>print "bad: $QUERY_STRING\n";
>  }

Thanks for that, Shawn.  It works perfectly except for one criteria 
that I inadvertently forgot to include.  It's possible that the string will 
_not_ contain the itemid parameter at all.  When that's missing, the regex 
matches and it shouldn't.  I guess that's why I was trying to stay with the 
positive match operator.

I tried inverting your regex:

if ( $QUERY_STRING =~ m/ itemid= .*? [^-0-9A-Za-z_]+? .*? (?: \& | \z ) /sx ) {
   say "bad: $QUERY_STRING";
}

but that doesn't work either.  It catches even good item numbers.

In the meantime, I got it to work by grabbing the itemid and working 
with that separately:

my $item_id = $1 if ($QUERY_STRING =~ m/ itemid=([^&]*) /x);
if ( $item_id =~ m/ [^a-zA-Z0-9_-] /x ) { ...

however, I'd like to do that with a single line, if possible, so I don't have 
to create a new variable just for that.

Thanks,
Frank
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex to match "bad" characters in a parameter

2016-01-26 Thread Chris Charley



"SSC_perl"  wrote in message 
news:ef7499af-b4a5-4b07-8c69-3192ef782...@surfshopcart.com...



On Jan 25, 2016, at 4:59 PM, Shawn H Corey wrote:


Use the negative match operator !~

 if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){
   print "bad: $QUERY_STRING\n";
 }


Thanks for that, Shawn.  It works perfectly except for one criteria that I 
inadvertently forgot to >include.  It's possible that the string will _not_ 
contain the itemid parameter at all.  When that's >missing, the regex 
matches and it shouldn't.  I guess that's why I was trying to stay with the 
>positive match operator.


I tried inverting your regex:

if ( $QUERY_STRING =~ m/ itemid= .*? [^-0-9A-Za-z_]+? .*? (?: \& | \z ) 
/sx ) {

 > say "bad: $QUERY_STRING";

}

but that doesn't work either.  It catches even good item numbers.

In the meantime, I got it to work by grabbing the itemid and working with 
that separately:


my $item_id = $1 if ($QUERY_STRING =~ m/ itemid=([^&]*) /x);
if ( $item_id =~ m/ [^a-zA-Z0-9_-] /x ) { ...

however, I'd like to do that with a single line, if possible, so I don't 
have to create a new variable >just for that.


Thanks,
Frank=


###
###

Hello Frank,

You could do that in 1 line - See the following small program.
(The line using a 'grep' solution is commented out. It would work as well).


#!/usr/bin/perl
use strict;
use warnings;

while (my $id = ) {
   chomp $id;
   #if (grep /itemid=.*?[^\w-]/, split /&/, $id) {
   if ($id =~ /itemid/ && $id !~ /itemid=[\w-]+(?:&|$)/) {
   print "Bad id: <$id>\n";
   }
}

__DATA__
itemid=AT18C_AT18C=1=main.htm=1=1=detail.htm=asc
c=detail.htm=AT18C
itemid=AT18/C
t=main.htm=1=1=detail.htm=asc
itemid=?AT18C


When this is run, it prints out:

Bad id: 

Re: Regex to match "bad" characters in a parameter

2016-01-26 Thread SSC_perl
On Jan 26, 2016, at 11:22 AM, Chris Charley wrote:
> 
> You could do that in 1 line - See the following small program.

Thanks, Chris.  That'll do the trick.  And the grep alternative is 
interesting, too.  I hadn't thought of that.

Regards,
Frank
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex to match "bad" characters in a parameter

2016-01-25 Thread Shawn H Corey
On Mon, 25 Jan 2016 16:16:40 -0800
SSC_perl  wrote:

>   I'm trying to find a way to trap bad item numbers.  I want to
> parse the parameter "itemid=" and then everything up to either an "&"
> or end-of-string.  A good item number will contain only ASCII
> letters, numbers, dashes, and underscores and may terminate with a
> "&" or it may not (see samples below).   The following string should
> test negative in the regex below:
> 
> my $QUERY_STRING = 'itemid=AT18C_AT18C=1';
> 
> but a string containing "itemid=AT18/C" should test positive, since
> it has a slash.
> 
>   I can catch a single bad character and get it to work, e.g.
> 
> if ( $QUERY_STRING =~ m| itemid= .*? [/]+? .*? &? |x ) {
> 
> but I'd like to do something like this instead to catch others:
> 
> if ( $QUERY_STRING =~ m| itemid= (?: .*? [^a-zA-Z0_-]+ .*? ) &? |x )
> { ...
> 
>   Unfortunately, I can't get it to work.  I've read perlretut,
> but can't see the answer.  What am I doing wrong?
> 
> Thanks,
> Frank
> 
> Here are a couple of test strings:
> 
> 'itemid=AT18C_AT18C=1=main.htm=1=1=detail.htm=asc'
> 
> 'c=detail.htm=AT18C'
> 
> 
> 
> 

Use the negative match operator !~

  if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){
print "bad: $QUERY_STRING\n";
  }


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex problem?

2015-11-25 Thread Andrew Solomon
The only problem I can see is that you want UPPERCASE-1234 and your regex
has lowercase. Try

(\A[A-Z]+)   # match and capture leading alphabetics


Andrew

p.s Why not add "use strict; use warnings", "my $var;" and wear a seat belt
when you're driving?:)



On Wed, Nov 25, 2015 at 5:09 PM, Rick T  wrote:

> The following code apparently is not doing what I wanted. My intention was
> to confirm that the general format of  $student_id was this: several
> uppercase letters followed by a hyphen followed by several digits. If not,
> it would trigger the die. Unfortunately it seems to always trigger the die.
> For example, if I let student_id = triplett-1, the script dies. I’m a
> beginner, so I often have trouble seeing the “obvious.” Any suggestions
> will be appreciated!
>
> if  ( $student_id =~
> /
> (\A[a-z]+)  # match and
> capture leading alphabetics
> -   # hyphen
> to separate surname from number
> ([0-9]+\z)  # match and
> capture trailing digits
> /xms# Perl Best
> Practices
> ) {
> $student_surname = $1;
> $student_number  = $2;
> }
> else {
> die "Bad general form for student_id: $student_id"
> };
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 
Andrew Solomon

Mentor@Geekuni http://geekuni.com/
http://www.linkedin.com/in/asolomon


Re: regex problem?

2015-11-25 Thread Shawn H Corey
On Wed, 25 Nov 2015 17:22:04 +
Andrew Solomon  wrote:

> The only problem I can see is that you want UPPERCASE-1234 and your
> regex has lowercase. Try
> 
> (\A[A-Z]+)   # match and capture leading alphabetics

Please put the anchor outside the capture. And you could use the POSIX
conventions:

m{ \A ([[:upper:]]+) }msx;

This will work with non-English characters. :)


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex capture question

2015-06-18 Thread Илья Рассадин
Hi, Tiago!

I can't reproduce such behaviour

use Modern::Perl '2014';
my $string = 'Crosses   misses=50   ';
my (@matches) = ($string =~ /(Crosses)(.*)(misses=)(\d+)/s);

use Data::Dumper;
print Dumper \@matches;

result:
$VAR1 = [
  'Crosses',
  ' ',
  'misses=',
  '50'
];

As you see, no tabs in $matches[3];

Please, publish string that you try to parse with this regexp.


чт, 18 июня 2015 г. в 16:24, Tiago Hori tiago.h...@gmail.com:

 Folks,

 I have the following regex: $_ =~ /(Crosses)(.*)(misses=)(\d+)/s

 It does what I need to do in terms of matching, but I also want to use the
 capture parenthesis. The data comes from tab-limited files and I use $4 to
 grab the last digits of the match, however it is also matching the trailing
 tab. I solved it by stripping of the tabs from the line, but I can figure
 out why (\d+) is also matching the tab!

 T.

 --
 Education is not to be used to promote obscurantism. - Theodonius
 Dobzhansky.

 Gracias a la vida que me ha dado tanto
 Me ha dado el sonido y el abecedario
 Con él, las palabras que pienso y declaro
 Madre, amigo, hermano
 Y luz alumbrando la ruta del alma del que estoy amando

 Gracias a la vida que me ha dado tanto
 Me ha dado la marcha de mis pies cansados
 Con ellos anduve ciudades y charcos
 Playas y desiertos, montañas y llanos
 Y la casa tuya, tu calle y tu patio

 Violeta Parra - Gracias a la Vida

 Tiago S. F. Hori. PhD.
 Ocean Science Center-Memorial University of Newfoundland



Re: regex capture question

2015-06-18 Thread Shlomi Fish
Hi Tiago,

Please reply to list if it's a mailing list post - http://shlom.in/reply .

On Thu, 18 Jun 2015 10:20:57 -0300
Tiago Hori tiago.h...@gmail.com wrote:

 Folks,
 
 I have the following regex: $_ =~ /(Crosses)(.*)(misses=)(\d+)/s
 
 It does what I need to do in terms of matching, but I also want to use the
 capture parenthesis. The data comes from tab-limited files and I use $4 to
 grab the last digits of the match, however it is also matching the trailing
 tab. I solved it by stripping of the tabs from the line, but I can figure
 out why (\d+) is also matching the tab!
 

That sounds strange and perl should not do that. Can you post a self-contained
and reproducing example that exhibits this behaviour? Sometimes reducing your
code to the bare reproducing minimum helps in finding where the problem is. I
could also use some information about your system (OS, distribution, perl,
versions , CPU architecture, etc.)

Regards,

Shlomi Fish

 T.
 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
http://www.shlomifish.org/humour/ways_to_do_it.html

Chuck Norris can construct any logical expression using only AND gates.
— http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex and parse

2014-03-11 Thread Paolo Gianrossi
A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl (
http://shop.oreilly.com/product/9781565922570.do)

A quick google search also brings out, e.g.
http://stackoverflow.com/questions/4736/learning-regular-expressions with
many links to resources.

HTH
paolo

--
Paolo Gianrossi

Like my grandma used to say,
don't sail an aluminium boat on a gallium lake.
(My grandma was a little strange.)
   -- xkcd


On 11 March 2014 18:01, Ariel Hosid ariel.ho...@gmail.com wrote:

 Hello everyone!
 Can anyone recommend me literature that treats regular expressions and how
 to analyze files?
 Thank you!

 --
 Ariel



Re: regex and parse

2014-03-11 Thread Rob Dixon

On 11/03/2014 17:01, Ariel Hosid wrote:


Can anyone recommend me literature that treats regular expressions and
how to analyze files?


The best documentation on regular expressions is Perl's own here

http://perldoc.perl.org/perlre.html

Analysing files is an enormous subject that is difficult to generalise.
Perhaps you should read what you can find on the internet and come back
here if you have a specific problem?

Rob


---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex and parse

2014-03-11 Thread Charles DeRykus
On Tue, Mar 11, 2014 at 10:01 AM, Ariel Hosid ariel.ho...@gmail.com wrote:
 Hello everyone!
 Can anyone recommend me literature that treats regular expressions and how
 to analyze files?

Some perl resources:

perldoc perlrequick(Perl regular expressions quick start)
perlretut   (Perl reg exp tutorial)
perlre   (Perl regular expressions, the rest of the story)

.--
Charles DeRykus

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex and parse

2014-03-11 Thread Andy Bach
On Tue, Mar 11, 2014 at 12:13 PM, Paolo Gianrossi 
paolino.gianro...@gmail.com wrote:

 A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl (
 http://shop.oreilly.com/product/9781565922570.do)


Just to +1 this - one of the best RE and programming books ever - it's not
just Perl REs but he covers and compares many other languages.  It's also
funny, as in LOL!


-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk


Re: regex and parse

2014-03-11 Thread Ariel Hosid
OK!
Thanks to all! :-)


2014-03-11 15:34 GMT-03:00 Andy Bach afb...@gmail.com:


 On Tue, Mar 11, 2014 at 12:13 PM, Paolo Gianrossi 
 paolino.gianro...@gmail.com wrote:

 A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl (
 http://shop.oreilly.com/product/9781565922570.do)


 Just to +1 this - one of the best RE and programming books ever - it's not
 just Perl REs but he covers and compares many other languages.  It's also
 funny, as in LOL!


 --

 a

 Andy Bach,
 afb...@gmail.com
 608 658-1890 cell
 608 261-5738 wk




-- 
Ariel


Re: regex to get version from file name

2014-02-23 Thread Wernher Eksteen
Hi,

Thanks, but how do I assign the value found by the regex to a variable so
that the 1.2.4 from 6 file names in the array @fileList are print only
once, and if there are other versions found say 1.2.5 and 1.2.6 to print
the unique values from all.

This is my script thus far. The aim of this script is to connect to the
site, remove all html tags and obtain only the file names I need.

#!/usr/bin/perl

use strict;
use warnings;


*# initiating package names to be used later*my @getList;
my @fileList;


*# get files using lynx and parse through it*my $url = 
http://mathias-kettner.com/download;;
open my $in, lynx -dump $url | or die $!;


*# get the bits we need and push it to an array to further filter what we
need*while($in){
 chomp;
  if( /\[(\d+)\](.+)/ ){
   next if $1 == 1;
push @getList, $2\n;
 }
}

*# filter only the files we need into final array*
foreach my $i (@getList) {
  my @list = split /\s+/, $i;
  push @fileList, $list[0]\n, if $i =~ /rpm|tar/  $i !~ /[0-9][a-z]/;
}


*# print the list*
print \nList of files to be retrieved from $url:\n\n @fileList\n;


*The output is then:*
List of files to be retrieved from http://mathias-kettner.com/download:

 check_mk-1.2.4.tar.gz
 check_mk-agent-1.2.4-1.noarch.rpm
 check_mk-agent-logwatch-1.2.4-1.noarch.rpm
 check_mk-agent-oracle-1.2.4-1.noarch.rpm
 mk-livestatus-1.2.4.tar.gz
 mkeventd-1.2.4.tar.gz

From that I want to get the value 1.2.4 and assign it to a variable, if
there are more than one value such as 1.2.5 and 1.2.6 as well, it should
print them too, but only the unique values.

My attempt shown below to print only the value 1.2.4 is as follow, but it
prints out 1.2.41.2.41.2.41.2.41.2.41.2.4 next to each other, if I pass a
newline to $i such as $i\n it then prints 11 ?

foreach my $i (@fileList) {
print $i =~  /\b(\d+\.\d+\.\d+)\b/;
}

Thank you,
Wernher

On Fri, Feb 21, 2014 at 4:27 PM, Shawn H Corey shawnhco...@gmail.comwrote:

 On Fri, 21 Feb 2014 16:21:57 +0200
 Wernher Eksteen crypt...@gmail.com wrote:

  Hi all,
 
  From the below file names I only need the version number 1.2.4 without
  explicitly specifying it.
 
   check_mk-1.2.4.tar.gz
   check_mk-agent-1.2.4-1.noarch.rpm
   check_mk-agent-logwatch-1.2.4-1.noarch.rpm
   check_mk-agent-oracle-1.2.4-1.noarch.rpm
   mk-livestatus-1.2.4.tar.gz
   mkeventd-1.2.4.tar.gz
 
  What regex can I use to obtain only the string value 1.2.4 from the
  file names (or whatever future versions based on the 3 numbers
  separated by 3 dots, [0-9].[0-9].[0-9]?
 
  Thanks!
  Wernher

 /\b(\d+\.\d+\.\d+)\b/


 --
 Don't stop where the ink does.
 Shawn

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: regex to get version from file name

2014-02-23 Thread shawn wilson
Use LWP to get web data - not lynx and the like unless you can't help it. I
prefer using Web::Scraper to parse html but either way it's probably best
not to use a regex (see SO and similar for discussions on the like).

On Feb 23, 2014 8:13 AM, Wernher Eksteen crypt...@gmail.com wrote:

 Hi,

 Thanks, but how do I assign the value found by the regex to a variable so
that the 1.2.4 from 6 file names in the array @fileList are print only
once, and if there are other versions found say 1.2.5 and 1.2.6 to print
the unique values from all.

 This is my script thus far. The aim of this script is to connect to the
site, remove all html tags and obtain only the file names I need.

 #!/usr/bin/perl

 use strict;
 use warnings;

 # initiating package names to be used later
 my @getList;
 my @fileList;

 # get files using lynx and parse through it
 my $url = http://mathias-kettner.com/download;;
 open my $in, lynx -dump $url | or die $!;

 # get the bits we need and push it to an array to further filter what we
need
 while($in){
  chomp;
   if( /\[(\d+)\](.+)/ ){
next if $1 == 1;
 push @getList, $2\n;
  }
 }

 # filter only the files we need into final array
 foreach my $i (@getList) {
   my @list = split /\s+/, $i;
   push @fileList, $list[0]\n, if $i =~ /rpm|tar/  $i !~ /[0-9][a-z]/;
 }

 # print the list
 print \nList of files to be retrieved from $url:\n\n @fileList\n;

 The output is then:

 List of files to be retrieved from http://mathias-kettner.com/download:


  check_mk-1.2.4.tar.gz
  check_mk-agent-1.2.4-1.noarch.rpm
  check_mk-agent-logwatch-1.2.4-1.noarch.rpm
  check_mk-agent-oracle-1.2.4-1.noarch.rpm
  mk-livestatus-1.2.4.tar.gz
  mkeventd-1.2.4.tar.gz

 From that I want to get the value 1.2.4 and assign it to a variable, if
there are more than one value such as 1.2.5 and 1.2.6 as well, it should
print them too, but only the unique values.

 My attempt shown below to print only the value 1.2.4 is as follow, but it
prints out 1.2.41.2.41.2.41.2.41.2.41.2.4 next to each other, if I pass a
newline to $i such as $i\n it then prints 11 ?

 foreach my $i (@fileList) {
 print $i =~  /\b(\d+\.\d+\.\d+)\b/;
 }


The 1s are all of the returns of true (or one match). You want to print
$i\n if (foo)


Re: regex to get version from file name

2014-02-23 Thread Wernher Eksteen
Thanks, I've changed it to use LWP.

I'm not sure how to download the actual file with LWP, so I've tried
File::Fetch which works, but it doesn't show download progress/status etc,
just hanging blank until the download completes. Any pointers on getting
download status/progress details?

foreach my $i (@fileList2) {
  my $file = $url/$i if $i =~ m/$getMenuItem/g;
  chomp($file);
  my $ff = File::Fetch-new(uri = $file);
  my $where = $ff-fetch() or die $ff-error;
}

Thanks,
Wernher



On Sun, Feb 23, 2014 at 4:35 PM, shawn wilson ag4ve...@gmail.com wrote:

 Use LWP to get web data - not lynx and the like unless you can't help it.
 I prefer using Web::Scraper to parse html but either way it's probably best
 not to use a regex (see SO and similar for discussions on the like).

 On Feb 23, 2014 8:13 AM, Wernher Eksteen crypt...@gmail.com wrote:
 
  Hi,
 
  Thanks, but how do I assign the value found by the regex to a variable
 so that the 1.2.4 from 6 file names in the array @fileList are print only
 once, and if there are other versions found say 1.2.5 and 1.2.6 to print
 the unique values from all.
 
  This is my script thus far. The aim of this script is to connect to the
 site, remove all html tags and obtain only the file names I need.
 
  #!/usr/bin/perl
 
  use strict;
  use warnings;
 
  # initiating package names to be used later
  my @getList;
  my @fileList;
 
  # get files using lynx and parse through it
  my $url = http://mathias-kettner.com/download;;
  open my $in, lynx -dump $url | or die $!;
 
  # get the bits we need and push it to an array to further filter what we
 need
  while($in){
   chomp;
if( /\[(\d+)\](.+)/ ){
 next if $1 == 1;
  push @getList, $2\n;
   }
  }
 
  # filter only the files we need into final array
  foreach my $i (@getList) {
my @list = split /\s+/, $i;
push @fileList, $list[0]\n, if $i =~ /rpm|tar/  $i !~ /[0-9][a-z]/;
  }
 
  # print the list
  print \nList of files to be retrieved from $url:\n\n @fileList\n;
 
  The output is then:
 
  List of files to be retrieved from http://mathias-kettner.com/download:
 
 
   check_mk-1.2.4.tar.gz
   check_mk-agent-1.2.4-1.noarch.rpm
   check_mk-agent-logwatch-1.2.4-1.noarch.rpm
   check_mk-agent-oracle-1.2.4-1.noarch.rpm
   mk-livestatus-1.2.4.tar.gz
   mkeventd-1.2.4.tar.gz
 
  From that I want to get the value 1.2.4 and assign it to a variable, if
 there are more than one value such as 1.2.5 and 1.2.6 as well, it should
 print them too, but only the unique values.
 
  My attempt shown below to print only the value 1.2.4 is as follow, but
 it prints out 1.2.41.2.41.2.41.2.41.2.41.2.4 next to each other, if I
 pass a newline to $i such as $i\n it then prints 11 ?
 
  foreach my $i (@fileList) {
  print $i =~  /\b(\d+\.\d+\.\d+)\b/;
  }
 

 The 1s are all of the returns of true (or one match). You want to print
 $i\n if (foo)



Re: regex to get version from file name

2014-02-23 Thread Jim Gibson

On Feb 21, 2014, at 6:21 AM, Wernher Eksteen crypt...@gmail.com wrote:

 Hi all,
 
 From the below file names I only need the version number 1.2.4 without 
 explicitly specifying it.
 
  check_mk-1.2.4.tar.gz
  check_mk-agent-1.2.4-1.noarch.rpm
  check_mk-agent-logwatch-1.2.4-1.noarch.rpm
  check_mk-agent-oracle-1.2.4-1.noarch.rpm
  mk-livestatus-1.2.4.tar.gz
  mkeventd-1.2.4.tar.gz
 
 What regex can I use to obtain only the string value 1.2.4 from the file 
 names (or whatever future versions based on the 3 numbers separated by 3 
 dots, [0-9].[0-9].[0-9]?

Here’s one that will do any number of digits, provided they are preceded by a 
hyphen and followed by a hyphen or period (like all of your samples):

  /-([\d.]+)[.-]/
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex to get version from file name

2014-02-23 Thread Jim Gibson

On Feb 23, 2014, at 5:10 AM, Wernher Eksteen crypt...@gmail.com wrote:

 Hi,
 
 Thanks, but how do I assign the value found by the regex to a variable so 
 that the 1.2.4 from 6 file names in the array @fileList are print only 
 once, and if there are other versions found say 1.2.5 and 1.2.6 to print the 
 unique values from all.
 
 
 From that I want to get the value 1.2.4 and assign it to a variable, if there 
 are more than one value such as 1.2.5 and 1.2.6 as well, it should print them 
 too, but only the unique values.
 
 My attempt shown below to print only the value 1.2.4 is as follow, but it 
 prints out 1.2.41.2.41.2.41.2.41.2.41.2.4 next to each other, if I pass a 
 newline to $i such as $i\n it then prints 11 ?
 
 foreach my $i (@fileList) {
 print $i =~  /\b(\d+\.\d+\.\d+)\b/;
 }

The parentheses in the above regular expression cause the matched substrings to 
be assigned to $1. If you wish to print those values, print $1 or assign the 
value of $1 to another variable and print it:

  if( $i =~  /\b(\d+\.\d+\.\d+)\b/ ) {
print “$1\n”;
  }

If you wish to find all of the unique values of what is captured, use the 
values as keys in a hash and print the keys after all the lines have been 
processed (untested):

my %unique;
foreach my $i (@fileList) {
  if( $i =~  /\b(\d+\.\d+\.\d+)\b/ ) {
$unique{$1}++;
}
for my $number ( sort keys %unique ) {
  print “Version $number had $unique{$number} files\n”;
}


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex to get version from file name

2014-02-23 Thread Wernher Eksteen
Thanks, this also worked for me...

foreach my $i (@fileList) {
push @versions, $i =~ m/\b(\d+\.\d+\.\d+)\b/g;
}

my %seen;
my @unique = grep { ! $seen{$_}++ } @versions;



On Sun, Feb 23, 2014 at 4:27 PM, Jim Gibson j...@gibson.org wrote:


 On Feb 23, 2014, at 5:10 AM, Wernher Eksteen crypt...@gmail.com wrote:

  Hi,
 
  Thanks, but how do I assign the value found by the regex to a variable
 so that the 1.2.4 from 6 file names in the array @fileList are print only
 once, and if there are other versions found say 1.2.5 and 1.2.6 to print
 the unique values from all.
 
 
  From that I want to get the value 1.2.4 and assign it to a variable, if
 there are more than one value such as 1.2.5 and 1.2.6 as well, it should
 print them too, but only the unique values.
 
  My attempt shown below to print only the value 1.2.4 is as follow, but
 it prints out 1.2.41.2.41.2.41.2.41.2.41.2.4 next to each other, if I
 pass a newline to $i such as $i\n it then prints 11 ?
 
  foreach my $i (@fileList) {
  print $i =~  /\b(\d+\.\d+\.\d+)\b/;
  }

 The parentheses in the above regular expression cause the matched
 substrings to be assigned to $1. If you wish to print those values, print
 $1 or assign the value of $1 to another variable and print it:

   if( $i =~  /\b(\d+\.\d+\.\d+)\b/ ) {
 print $1\n;
   }

 If you wish to find all of the unique values of what is captured, use the
 values as keys in a hash and print the keys after all the lines have been
 processed (untested):

 my %unique;
 foreach my $i (@fileList) {
   if( $i =~  /\b(\d+\.\d+\.\d+)\b/ ) {
 $unique{$1}++;
 }
 for my $number ( sort keys %unique ) {
   print Version $number had $unique{$number} files\n;
 }


 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: regex to get version from file name

2014-02-23 Thread Wernher Eksteen
Great thank you!


On Fri, Feb 21, 2014 at 6:02 PM, Jim Gibson j...@gibson.org wrote:


 On Feb 21, 2014, at 6:21 AM, Wernher Eksteen crypt...@gmail.com wrote:

  Hi all,
 
  From the below file names I only need the version number 1.2.4 without
 explicitly specifying it.
 
   check_mk-1.2.4.tar.gz
   check_mk-agent-1.2.4-1.noarch.rpm
   check_mk-agent-logwatch-1.2.4-1.noarch.rpm
   check_mk-agent-oracle-1.2.4-1.noarch.rpm
   mk-livestatus-1.2.4.tar.gz
   mkeventd-1.2.4.tar.gz
 
  What regex can I use to obtain only the string value 1.2.4 from the file
 names (or whatever future versions based on the 3 numbers separated by 3
 dots, [0-9].[0-9].[0-9]?

 Here's one that will do any number of digits, provided they are preceded
 by a hyphen and followed by a hyphen or period (like all of your samples):

   /-([\d.]+)[.-]/
 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: regex to get version from file name

2014-02-21 Thread Shawn H Corey
On Fri, 21 Feb 2014 16:21:57 +0200
Wernher Eksteen crypt...@gmail.com wrote:

 Hi all,
 
 From the below file names I only need the version number 1.2.4 without
 explicitly specifying it.
 
  check_mk-1.2.4.tar.gz
  check_mk-agent-1.2.4-1.noarch.rpm
  check_mk-agent-logwatch-1.2.4-1.noarch.rpm
  check_mk-agent-oracle-1.2.4-1.noarch.rpm
  mk-livestatus-1.2.4.tar.gz
  mkeventd-1.2.4.tar.gz
 
 What regex can I use to obtain only the string value 1.2.4 from the
 file names (or whatever future versions based on the 3 numbers
 separated by 3 dots, [0-9].[0-9].[0-9]?
 
 Thanks!
 Wernher

/\b(\d+\.\d+\.\d+)\b/


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex headache

2014-02-04 Thread Dr.Ruud

On 2014-02-03 21:30, Paul Fontenot wrote:

Hi, I am attempting to write a regex but it is giving me a headache.

I have two log entries

1. Feb  3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR
[org.apache.commons.logging.impl.Log4JLogger]
2. Feb  3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR [STDERR]

I am using the following
^\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+\w+\s+\[\d{1,2}:\s+\d{1,2}:\d{1,
2},\d{3}\]\s+\w+\s+\[[a-zA-Z0-9.]\]

My problem is this greedy little '.' - I need to just be a period. How do I
match #1 and not match #2?



I think you should replace \[[a-zA-Z0-9.]\] by \[[^]]+\].

Don't worry of matching, see this as parsing, and skip a line on how it 
matches, not on how it doesn't match.


Hint: start using named captures.

If you are into massively scanning log files, try MCE::Grep.

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: regex headache

2014-02-03 Thread Jim Gibson

On Feb 3, 2014, at 12:30 PM, Paul Fontenot wrote:

 Hi, I am attempting to write a regex but it is giving me a headache.
 
 I have two log entries
 
 1. Feb  3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR
 [org.apache.commons.logging.impl.Log4JLogger]
 2. Feb  3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR [STDERR]
 
 I am using the following
 ^\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+\w+\s+\[\d{1,2}:\s+\d{1,2}:\d{1,
 2},\d{3}\]\s+\w+\s+\[[a-zA-Z0-9.]\]
 
 My problem is this greedy little '.' - I need to just be a period. How do I
 match #1 and not match #2?


You appear to be making the job too difficult. The only difference between 
lines 1. and 2. is the last column. To differentiate those two, you can do this 
(assuming the string is in $_):

if( /\[STDERR\]/ ) {
  # process line 2
}else{
  # process line 1
}

Do you really need to match each field in the entire line? If so, I would try 
splitting the lines on whitespace and extracting the columns you need that way. 
Whether or not that works depends upon: 1) how much variation there can be in 
your log entries, and 2) what exactly you need to extract from each entry. 
Fixing that regex may not be the most productive approach in the long term.

As for your specific question, a period in a character class (e.g., [.]) will 
match a period. A period in the regex pattern will match any character (except 
possibly a newline). To match a period character, escape the period: /\./


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex not working correctly

2013-12-11 Thread Shlomi Fish
Hi punit,

On Wed, 11 Dec 2013 21:04:39 +0530
punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I have a requirement where I need to capture phone number from different
 strings.
 
 The strings could be :-
 
 
 1. COMP TEL NO 919369721113  for computer science
 
 2. For Best Discount reach 092108493, from 6-9
 
 3. Your booking Confirmed, 9210833321
 
 4. price for free consultation call92504060
 
 5. price for free consultation call92504060number
 
 I created a regex as below :-
 
 #!/usr/bin/perl
 
 my $line= shift @ARGV;
 
 if($line =~
 /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
 {
 
 print one = $1;
 
 
 }
 It works fine for 1, 2,3 and prints number however for 4 and 5 one I get
 number in $2 rather than $1 tough I have pipe operator to check it.
 
 Any clue how to fix this ?

I suggest you use named captures (a feature of perl-5.10.x-and-above) and then
you can do something like:

my $my_capture = ($+{'capture1'} // $+{'capture2'});

I think this is the best way to do it. (You can also do $1 // $2, but please
don't).

Regards,

Shlomi Fish


-- 
-
Shlomi Fish   http://www.shlomifish.org/
The Case for File Swapping - http://shlom.in/file-swap

Why can’t we ever attempt to solve a problem in this country without having
a “War” on it? -- Rich Thomson, talk.politics.misc

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex not working correctly

2013-12-11 Thread Jim Gibson

On Dec 11, 2013, at 7:34 AM, punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I have a requirement where I need to capture phone number from different 
 strings.
 
 The strings could be :-
 
 
 1. COMP TEL NO 919369721113  for computer science
 
 2. For Best Discount reach 092108493, from 6-9
 
 3. Your booking Confirmed, 9210833321
 
 4. price for free consultation call92504060
 
 5. price for free consultation call92504060number
 
 I created a regex as below :-
 
 #!/usr/bin/perl
 
 my $line= shift @ARGV;
 
 if($line =~ 
 /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
  {
 print one = $1;
 
 
 
 }
 
 It works fine for 1, 2,3 and prints number however for 4 and 5 one I get 
 number in $2 rather than $1 tough I have pipe operator to check it.
 
 Any clue how to fix this ?

Your first step is to rewrite the regular expression using the extended syntax 
x modifier and add some whitespace:
 
if($line =~ 
m{ 
  (?:
(?: \D+ | \s+ )
(?:
  ( 
91\d{10} | 
0\d{10} |
[7-9]\d{9} |
0\d{11}
  ) |
  (?:
(?:
  ph |
  cal
)
(\d+)
  )
)
  ) |
  (?: 
(?:
  ( 91\d{10} |
0\d{10} |
[7-9]\d{9} |
0\d{11}) |
  (?: 
(?:
  ph | 
  cal
) 
(\d+)
  )
)
(?:
  \D+ |
  \s+
)
  ) 
}x 
) {

Then maybe you will have some hope of figuring out why it doesn’t work (I 
certainly can’t). 

I suggest you break it up into a series of if-then-else statements:

  if( $line =~ /91\d{10} | \\d{10} | [7-9]\d{9} | 0\d{11} ) {
   $number = $1;
  }elsif( $line =~ (?:ph|cal)\d+ ) {
$number = $1;
  }elsif( … ) {
  }else{
print “No match for $line”;
  }

You don’t need to do it all in one regex. Debugging each of those smaller 
regexes will be easier than debugging the whole thing.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex not working correctly

2013-12-11 Thread punit jain
Thanks Shlomi, thats a good idea. However at the same time I was trying to
understand if something is wrong in my regex. Why would $2 capture the
number as I have used :-

(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))

This would in my understanding match either number with regex
91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}
or with call followed by digits.

In my case 4 ( price for free consultation call92504060) why would $1 store
an empty string and $2 actually stores the number part ?

Regards,
Punit


RE: Regex not working correctly

2013-12-11 Thread vijaya R
Hi,

You can try the below pattern.

if($line=~/([0-9]{3,})/gs) {
print $1;
}

Thanks,
Vijaya

--
From: punit jain
Sent: 12/11/2013 9:07 PM
To: beginners@perl.org
Subject: Regex not working correctly

Hi,

I have a requirement where I need to capture phone number from different
strings.

The strings could be :-


1. COMP TEL NO 919369721113  for computer science

2. For Best Discount reach 092108493, from 6-9

3. Your booking Confirmed, 9210833321

4. price for free consultation call92504060

5. price for free consultation call92504060number

I created a regex as below :-

#!/usr/bin/perl

my $line= shift @ARGV;

if($line =~
/(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/)
{

print one = $1;


}
It works fine for 1, 2,3 and prints number however for 4 and 5 one I get
number in $2 rather than $1 tough I have pipe operator to check it.

Any clue how to fix this ?


Re: Regex not working correctly

2013-12-11 Thread Robert Wohlfarth
On Wed, Dec 11, 2013 at 10:35 AM, punit jain contactpunitj...@gmail.comwrote:


 Thanks Shlomi, thats a good idea. However at the same time I was trying to
 understand if something is wrong in my regex. Why would $2 capture the
 number as I have used :-

 (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))

 This would in my understanding match either number with regex 
 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}
 or with call followed by digits.

 In my case 4 ( price for free consultation call92504060) why would $1
 store an empty string and $2 actually stores the number part ?


There are two sets of capturing parenthesis:
* (91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}) = $1
* (\d+) = $2

The first set stores its match in $1 and the second set in $2. The pipe
(or) does not reset the capture counter back to 1. The counter strictly
goes from left to right.

-- 
Robert Wohlfarth


Re: Regex not working correctly

2013-12-11 Thread punit jain
That answers my question.

Thanks Robert


Re: Regex help needed

2013-01-09 Thread *Shaji Kalidasan*
Punit Jain,

This is not the optimized code but you can refactor it. This works for the 
given scenario, no matter the order of input data.

Hope it helps to some extent.

[code]
my $var = '';
my @args = ();
my %hash;

while (DATA) {
chomp;
my ($var,$arg) = split /=/,$_,2;
if($var eq '{') {
@args = (); #Reset if we encounter '{'
}
my @arg1 = split /,/,$arg if defined $arg;
if(scalar @arg1  scalar @args) {
            $hash{$var} = $arg unless($var eq '{' || $var eq '}');
            @args = @arg1;
}
}

foreach my $k (sort keys %hash) {
print $k = $hash{$k}\n;
}

__DATA__
{
test = (test123);
test = (test123,abc,xyz);
test = (test123,abc);
}
{
test1 = (passfile,pasfile1,user);
test1 = (passfile);
test1 = (passfile,pasfile1);
}
{
test2 = (temp);
test2 = (temp,temp1);
test2 = (temp,temp1,username);
}
{
test3 = (betty,betty1,jack);
test3 = (betty,betty1);
test3 = (betty);
}
[/code]

[output]
test  =  (test123,abc,xyz);
test1  =  (passfile,pasfile1,user);
test2  =  (temp,temp1,username);
test3  =  (betty,betty1,jack);
[/output]
 
best,
Shaji 
---
Your talent is God's gift to you. What you do with it is your gift back to God.
---



 From: punit jain contactpunitj...@gmail.com
To: beginners@perl.org beginners@perl.org 
Sent: Tuesday, 8 January 2013 5:58 PM
Subject: Regex help needed
 
Hi ,

I have a file as below : -

{
test = (test123);
test = (test123,abc);
test = (test123,abc,xyz);
}
{
test1 = (passfile);
test1 = (passfile,pasfile1);
test1 = (passfile,pasfile1,user);
}

and so on 

The requirement is to have the file parsing so that final output is  :-

test = (test123,abc,xyz);
test1 = (passfile,pasfile1,user);

So basically only pick the lines with maximum number of options for each
type.

Regards.

Re: Regex help needed

2013-01-09 Thread Dr.Ruud

On 2013-01-08 13:28, punit jain wrote:


{
test = (test123);
test = (test123,abc);
test = (test123,abc,xyz);
}
{
test1 = (passfile);
test1 = (passfile,pasfile1);
test1 = (passfile,pasfile1,user);
}

and so on 

The requirement is to have the file parsing so that final output is  :-

test = (test123,abc,xyz);
test1 = (passfile,pasfile1,user);

So basically only pick the lines with maximum number of options for each
type.


Or just print the last long line:

echo '{
test = (test123);
test = (test123,abc);
test = (test123,abc,xyz);
}
{
test1 = (passfile);
test1 = (passfile,pasfile1);
test1 = (passfile,pasfile1,user);
}
' |perl -wne'$o=$n||0;$p=$_,next if($n=length)$o;$n=3;print$p'

test = (test123,abc,xyz);
test1 = (passfile,pasfile1,user);


Which preserves order too. :)

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex help needed

2013-01-08 Thread Jim Gibson

On Jan 8, 2013, at 4:28 AM, punit jain wrote:

 Hi ,
 
 I have a file as below : -
 
 {
 test = (test123);
 test = (test123,abc);
 test = (test123,abc,xyz);
 }
 {
 test1 = (passfile);
 test1 = (passfile,pasfile1);
 test1 = (passfile,pasfile1,user);
 }
 
 and so on 
 
 The requirement is to have the file parsing so that final output is  :-
 
 test = (test123,abc,xyz);
 test1 = (passfile,pasfile1,user);
 
 So basically only pick the lines with maximum number of options for each
 type.

The easiest solution I can think of would be to extract the first token on each 
line, use that token as a hash key, count the number of commas in each line, 
and save the line in the hash with the largest number of commas for each key. 

This will not work if your strings have commas. In that case, you might want to 
consider using a parsing module, such as Text::CSV, that will correctly handle 
your input data. You can use Text::CSV to split your input lines into fields 
and count the number of fields. However, you will first have to extract the 
quoted strings from the surrounding parentheses. You can use the Text::Balanced 
module to do that. Both Text::CSV and Text::Balanced are available at CPAN 
(http;//search.cpan.org).

The best way for you to learn programming will be to attempt writing a program 
to accomplish your task, then post your program if you have trouble getting it 
to do what you want.

Good luck.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex help needed

2013-01-08 Thread timothy adigun
Hi punit jain,

 Please check my comments below.

On Tue, Jan 8, 2013 at 1:28 PM, punit jain contactpunitj...@gmail.comwrote:

 Hi ,

 I have a file as below : -

 {
 test = (test123);
 test = (test123,abc);
 test = (test123,abc,xyz);
 }
 {
 test1 = (passfile);
 test1 = (passfile,pasfile1);
 test1 = (passfile,pasfile1,user);
 }

 and so on 

 The requirement is to have the file parsing so that final output is  :-

 test = (test123,abc,xyz);
 test1 = (passfile,pasfile1,user);

 So basically only pick the lines with maximum number of options for each
 type.

 Regards.


I basically agreed with Jim on this:
Jim  to learn programming will be to attempt writing a program to
accomplish your task, Jim  then post your program if you have trouble
getting it to do what you want.

However, if I may suggest using hash, if the lines with the maximum number
of options for each type *is the last one in each case*. Since, *hash will
only permit only one key*. So, splitting each line on =, one can take key
and value for hash.

So, based on the data presented, one can write like so:

use warnings;
use strict;

my %collection_hash;

while (DATA) {
chomp;
if (/=/) {
my ( $key, $value ) = split /=/, $_, 2;
$collection_hash{$key} = $value;
}
}

print $_, ' = ', $collection_hash{$_}, $/ for sort keys %collection_hash;

__DATA__
{
test = (test123);
test = (test123,abc);
test = (test123,abc,xyz);
}
{
test1 = (passfile);
test1 = (passfile,pasfile1);
test1 = (passfile,pasfile1,user);
}


*OUTPUT:*
test  =  (test123,abc,xyz);
test1  =  (passfile,pasfile1,user);

Please, *NOTE* that this will only work as you want if the last line in
each case has the maximum options, this is what the data you showed here
presented.





-- 
Tim


Re: Regex issue

2013-01-06 Thread midhun
Ya, this code is perfect Punit. This works fine for me too.

Regards,
Midhun

On Thu, Jan 3, 2013 at 4:46 PM, Paul Johnson p...@pjcj.net wrote:

 On Thu, Jan 03, 2013 at 03:53:20PM +0530, punit jain wrote:
  Hi,
 
  I am facing issues in parsing using Regex. The problem definition is as
  below : -

  I want to parse it in such a way that  all data with BEGIN and END goes
 in
  one file and BEGINDL and ENDDL goes in other with kind of processing I
 want
  to so.
 
  I am using below code but doesnot work : -

 What doesn't work?  It seems fine to me.

  #!/usr/bin/perl
  my $file=shift;
  open( FH , $file ) or die(open failed: $!\n);
  open ($fh1, /tmp/a);
  open ($fh2, /tmp/b);
  my $check=0;

 You probably want $check = 2 here.

  while (FH) {
  #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof;
  if($_ =~ /BEGIN$/ || ($check == 0) ) {
  print $fh1 $_;
  $check = 0;
  if($_ =~ /END$/) {
  $check = 2;
  }
  }elsif($_ =~ /BEGINDL/ || ($check == 1)) {
  print $fh2 $_;
  $check = 1;
  if($_ =~ /ENDDL/) {
  $check = 2;
  }
  }
  next unless($check == 2);
  }
 
  Any better suggestion ?

 Depends on how you define better, but perhaps

  $ perl -ne 'print if /BEGIN$/ .. /END$/'  file  /tmp/a
  $ perl -ne 'print if /BEGINDL$/ .. /ENDDL$/'  file  /tmp/b

 --
 Paul Johnson - p...@pjcj.net
 http://www.pjcj.net

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/





Re: Regex issue

2013-01-03 Thread Shlomi Fish
Hi Punit,

some comments on your code:

On Thu, 3 Jan 2013 15:53:20 +0530
punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I am facing issues in parsing using Regex. The problem definition is as
 below : -
 
 A file with data :-
 
 BEGIN
 country Japan
 passcode 1123
 listname sales
 contact ch...@example.com
 contact m...@example.com
 END
 
 BEGIN
 country Namibia
 passcode 9801
 listname dept
 contact l...@example.com
 END
 
 BEGINDL
 country US
 passcode 4123
 listname Investment
 member a...@example.com
 member b...@example.com
 ENDDL
 
 BEGIN
 country US
 passcode 4432
 listname testing
 contact lore...@test.com
 contact a...@test.com
 END
 ..
 .
 ...
 ..
 .
 
 I want to parse it in such a way that  all data with BEGIN and END goes in
 one file and BEGINDL and ENDDL goes in other with kind of processing I want
 to so.
 
 I am using below code but doesnot work : -
 
 #!/usr/bin/perl

use strict; use warnings;

 my $file=shift;

Don't call variables file. In your case it should be filename.

 open( FH , $file ) or die(open failed: $!\n);

Don't use bareword file handles or two args open.

 open ($fh1, /tmp/a);
 open ($fh2, /tmp/b);

use autodie and three args open.

 my $check=0;
 while (FH) {

chomp and use a lexical variable to iterate over the lines (say $line or $l)
instead of $_ which can be clobbered and devastated very easily.

 #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof;
 if($_ =~ /BEGIN$/ || ($check == 0) ) {

You probably want « $_ eq 'BEGIN' » instead (after chomp).

 print $fh1 $_;
 $check = 0;
 if($_ =~ /END$/) {
 $check = 2;
 }
 }elsif($_ =~ /BEGINDL/ || ($check == 1)) {
 print $fh2 $_;
 $check = 1;
 if($_ =~ /ENDDL/) {
 $check = 2;
 }
 }
 next unless($check == 2);

Always label your nexts (and in this case I think it is redundant).

See:

http://perl-begin.org/tutorials/bad-elements/

Regards,

Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
Stop Using MSIE - http://www.shlomifish.org/no-ie/

Bigamy: Having one wife too many.
Monogamy: The same thing!   — Unknown source.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex issue

2013-01-03 Thread Paul Johnson
On Thu, Jan 03, 2013 at 03:53:20PM +0530, punit jain wrote:
 Hi,
 
 I am facing issues in parsing using Regex. The problem definition is as
 below : -

 I want to parse it in such a way that  all data with BEGIN and END goes in
 one file and BEGINDL and ENDDL goes in other with kind of processing I want
 to so.
 
 I am using below code but doesnot work : -

What doesn't work?  It seems fine to me.

 #!/usr/bin/perl
 my $file=shift;
 open( FH , $file ) or die(open failed: $!\n);
 open ($fh1, /tmp/a);
 open ($fh2, /tmp/b);
 my $check=0;

You probably want $check = 2 here.

 while (FH) {
 #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof;
 if($_ =~ /BEGIN$/ || ($check == 0) ) {
 print $fh1 $_;
 $check = 0;
 if($_ =~ /END$/) {
 $check = 2;
 }
 }elsif($_ =~ /BEGINDL/ || ($check == 1)) {
 print $fh2 $_;
 $check = 1;
 if($_ =~ /ENDDL/) {
 $check = 2;
 }
 }
 next unless($check == 2);
 }
 
 Any better suggestion ?

Depends on how you define better, but perhaps

 $ perl -ne 'print if /BEGIN$/ .. /END$/'  file  /tmp/a
 $ perl -ne 'print if /BEGINDL$/ .. /ENDDL$/'  file  /tmp/b

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex help

2012-12-22 Thread Paul Johnson
On Sat, Dec 22, 2012 at 04:45:21PM +0530, punit jain wrote:
 Hi,
 
 I have a file like below : -
 
 BEGIN:VCARD
 VERSION:2.1
 EMAIL:te...@test.com
 FN:test1
 REV:20101116T030833Z
 UID:644938456.1419.
 END:VCARD
 
 From (S___-0003) Tue Nov 16 03:10:15 2010
 content-class: urn:content-classes:person
 Date: Tue, 16 Nov 2010 11:10:15 +0800
 Subject: test
 Message-ID: 644938507.1420
 MIME-Version: 1.0
 Content-Type: text/x-vcard; charset=utf-8
 
 BEGIN:VCARD
 VERSION:2.1
 EMAIL:te...@test.com
 FN:test2
 REV:20101116T031015Z
 UID:644938507.1420
 END:VCARD
 
 
 
 My requirement is to get all text between BEGIN:VCARD and END:VCARD and all
 the instances. So o/p should be :-
 
 BEGIN:VCARD
 VERSION:2.1
 EMAIL:te...@test.com
 FN:test1
 REV:20101116T030833Z
 UID:644938456.1419.
 END:VCARD
 
 BEGIN:VCARD
 VERSION:2.1
 EMAIL:te...@test.com
 FN:test2
 REV:20101116T031015Z
 UID:644938507.1420
 END:VCARD
 
 I am using below regex  :-
 
 my $fh = IO::File-new($file, r);
 my $script = do { local $/; $fh };
 close $fh;
 if (
$script =~ m/
 (^BEGIN:VCARD\s*(.*)
 ^END:VCARD\s+)/sgmix
 ){
 print OUTFILE $1.\n;
 }
 
 However it just prints 1st instance and not all.

It also prints the text between the two instances, right?

 Any suggestions ?

You need a non greedy match .*? instead of the greedy match .* that you
are using.  Then you'll need to use while instead of if.

Or perhaps you'd prefer:

 $ perl -ne 'print if /BEGIN:VCARD/ .. /END:VCARD/'  in  out

or

 $ perl -n00e 'print if /^BEGIN:VCARD/'  in  out

See perldoc perlrun for the switches and Range Operators from perdoc
perlop for ..

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex help

2012-12-22 Thread David Precious
On Sat, 22 Dec 2012 16:45:21 +0530
punit jain contactpunitj...@gmail.com wrote:

 Hi,
 
 I have a file like below : -

[snipped example - vcards with mail headers etc in between]


 My requirement is to get all text between BEGIN:VCARD and END:VCARD
 and all the instances. So o/p should be :-
[...]
 I am using below regex  :-
[...]
 Any suggestions ?

You've already had a reply indicating how to solve the problem you were
having with regexes, so I won't touch on that.

What I will advise, is that for any task you're trying to accomplish,
there's a pretty good chance someone has already solved that and made
code available on CPAN that will help you - so always check CPAN first,
to avoid unnecessarily reinventing the wheel each time (unless you're
doing so solely for a learning experience, of course).

In this case, parsing vcards is likely a common task - a quick look on
CPAN turns up Text::vCard::Addressbook:

https://metacpan.org/module/Text::vCard::Addressbook


From the synopsis:

  use Text::vCard::Addressbook;
 
  my $address_book = Text::vCard::Addressbook-new(
  { 'source_file' = '/path/to/address.vcf', } );
 
  foreach my $vcard ( $address_book-vcards() ) {
  print Got card for  . $vcard-fullname() . \n;
  }

It will ignore the non-vcard content in the example you provided, and
just provide you easy access to the data from each vcard.

That's a much nicer approach than extracting it yourself with regexes.

Cheers

Dave P


-- 
David Precious (bigpresh) dav...@preshweb.co.uk
http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter
www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook
www.preshweb.co.uk/cpanwww.preshweb.co.uk/github



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex help

2012-12-22 Thread Rob Dixon

On 22/12/2012 11:15, punit jain wrote:

Hi,

I have a file like below : -

BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test1
REV:20101116T030833Z
UID:644938456.1419.
END:VCARD

 From (S___-0003) Tue Nov 16 03:10:15 2010
content-class: urn:content-classes:person
Date: Tue, 16 Nov 2010 11:10:15 +0800
Subject: test
Message-ID: 644938507.1420
MIME-Version: 1.0
Content-Type: text/x-vcard; charset=utf-8

BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test2
REV:20101116T031015Z
UID:644938507.1420
END:VCARD



My requirement is to get all text between BEGIN:VCARD and END:VCARD and all
the instances. So o/p should be :-

BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test1
REV:20101116T030833Z
UID:644938456.1419.
END:VCARD

BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test2
REV:20101116T031015Z
UID:644938507.1420
END:VCARD

I am using below regex  :-

my $fh = IO::File-new($file, r);
my $script = do { local $/; $fh };
 close $fh;
 if (
$script =~ m/
 (^BEGIN:VCARD\s*(.*)
 ^END:VCARD\s+)/sgmix
 ){
 print OUTFILE $1.\n;
 }

However it just prints 1st instance and not all.

Any suggestions ?


This is very simply done with Perl's range operator. See the program
below.

Rob


use strict;
use warnings;

open my $fh, '', 'vcard.txt' or die $!;

while ($fh) {
  print if /^BEGIN:VCARD/ .. /^END:VCARD/;
}

**output**

BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test1
REV:20101116T030833Z
UID:644938456.1419.
END:VCARD
BEGIN:VCARD
VERSION:2.1
EMAIL:te...@test.com
FN:test2
REV:20101116T031015Z
UID:644938507.1420
END:VCARD


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex one-liner to find several multi-line blocks of text in a single file

2012-11-01 Thread Paul Johnson
On Thu, Nov 01, 2012 at 12:44:08AM -0700, Thomas Smith wrote:
 Hi,
 
 I'm trying to search a file for several matching blocks of text. A sample
 of what I'm searching through is below.
 
 What I want to do is match # START block # through to the next
 # END block # and repeat that throughout the file without
 matching any of the text that falls between each matched block (that is,
 the ok: some text lines should not be matched). Here is the one-liner I'm
 using:
 
 perl -p -e '/^# START block #.*# END block #$/s' file.txt
 
 I've tried a few variations of this but with the same result--a match is
 being made from the first # START block # to the last # END
 block #, and everything in between... I believe that the .*,
 combined with the s modifier, in the regex is causing this match to be
 made.
 
 What I'm not sure how to do is tell Perl to search from START to the next
 END and then start the search pattern over again with the next START-END
 match.
 
 How might I go about achieving this?

perl -ne 'print if /# START block #/ .. /# END block #/' 
file.txt

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex one-liner to find several multi-line blocks of text in a single file

2012-11-01 Thread Jim Gibson

On Nov 1, 2012, at 12:44 AM, Thomas Smith wrote:

 Hi,
 
 I'm trying to search a file for several matching blocks of text. A sample
 of what I'm searching through is below.
 
 What I want to do is match # START block # through to the next
 # END block # and repeat that throughout the file without
 matching any of the text that falls between each matched block (that is,
 the ok: some text lines should not be matched). Here is the one-liner I'm
 using:
 
 perl -p -e '/^# START block #.*# END block #$/s' file.txt
 
 I've tried a few variations of this but with the same result--a match is
 being made from the first # START block # to the last # END
 block #, and everything in between... I believe that the .*,
 combined with the s modifier, in the regex is causing this match to be
 made.

The '*' is what's called a greedy quantifier. That means it will match as 
many characters in the string as possible. What the regular expression engine 
does when it encounters the pattern '.*' is to immediately match it with as 
many characters as possible. Since your regular expression includes the 's' 
modifier, this will include newlines as well. When the RE engine sees that 
there are characters in the pattern after the '.*', it will start removing 
characters from the end of the substring matched by the '.*' until the 
subsequent pattern characters are also matched. This will continue until there 
are no characters matched by the '.*'.

The result of all this is that for your pattern, the last '# END block 
#' substring is the one that will be matched, and the '.*' pattern will 
match everything between the first '# START block #' and the last 
'# END block #'.

The way to fix this is to make the '*' quantifier non-greedy by putting a '?' 
quantifier after it. With that pattern, the RE engine will match as few 
characters as possible, and the first START block will pair up with the first 
subsequent END block. A 'g' modifier will tell the RE engine to start looking 
after each match for the next match in the string.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-28 Thread Dr.Ruud

On 2012-07-27 17:43, Andy Bach wrote:

On Fri, Jul 27, 2012 at 10:22 AM, Dr.Ruud rvtol+use...@isolution.nl wrote:

On 2012-07-27 16:58, Andy Bach wrote:



   if ($model=~/(\S+)\s+(.*)\s*$/) {


The \s* in the end does nothing.


Well, I was thinking if it's a multi-word second match:
v6 Austin Martinspacespace

Then that would matches the rest of the phrase and trims trailing blanks.


The '.*' already picks up any trailing blanks. So they will be in $2.

But making it non-greedy, works:

perl -wle '
 /(\S+)\s+(.*?)\s*$/ and print $1$2
  for v6 Aston Martin  ;
'
v6Aston Martin

(which surprised me, and I would never use it like that,
because for me it is not explicit enough)

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-27 Thread Jim Gibson

On Jul 27, 2012, at 7:04 AM, Gary Stainburn wrote:

 Hi folks.
 
 I'm struggling to see what I'm doing wrong.  I have the following code in one 
 of my programs but it isn't working as it should.
 
 
 print STDERR enqmake='$enqmake' model='$model'\n;
 if (!$enqmake  $model) { # extract make
  print STDERR About to split '$model'\n;
  if ($model=~/ *?(\w*) (.*?) *$/) {
$enqmake=lc($1);
$model=$2;
print STDERR model split into '$enqmake' '$model'\n;
  }
 } # extract make
 
 This generates:
 
 enqmake='' model='Kia Venga'
 About to split 'Kia Venga'
 
 I have a test script which works fine. Can anyone see what I'm doing wrong?

No. Your script works fine for me if I precede it with the following two lines:

my $model = 'Kia Venga';
my $engmake;

 
 #!/usr/bin/perl -w
 
 use warnings;
 use strict;
 
 my $t='Kia Venga';
 
 if ($t=~/ *?(\w*) (.*?) *$/) {
  print 1='$1' 2='$2'\n;
 }
 
 [root@ollie exim]# ~/t
 1='Kia' 2='Venga'

Your test script also works fine. Therefore, it must be something else in your 
larger program.

I suggest you use the escape sequence \s for whitespace instead of just using 
the space character. You should also use the x modifier so that spaces in your 
pattern will be ignored. That will allow you to determine by inspection what 
your pattern is really doing. I also use m{ } to delineate the pattern and \z 
to anchor the end of match instead of $:

if( $model =~ m{ \s*? (\w*) \s (.*?) \s* \z }x ) {
  ...

Why aren't you using the split function?

($model,$engmake) = split(' ',$model);


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-27 Thread Shawn H Corey
On Fri, 27 Jul 2012 07:29:13 -0700
Jim Gibson jimsgib...@gmail.com wrote:

 Why aren't you using the split function?
 
 ($model,$engmake) = split(' ',$model);

That would be:

($model,$engmake) = split(' ',$model, 2);

See `perldoc -f split` for details.


-- 
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

_Perl links_
official site   : http://www.perl.org/
beginners' help : http://learn.perl.org/faq/beginners.html
advance help: http://perlmonks.org/
documentation   : http://perldoc.perl.org/
news: http://perlsphere.net/
repository  : http://www.cpan.org/
blog: http://blogs.perl.org/
regional groups : http://www.pm.org/

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-27 Thread Andy Bach
On Fri, Jul 27, 2012 at 9:04 AM, Gary Stainburn
gary.stainb...@ringways.co.uk wrote:
   print STDERR About to split '$model'\n;
   if ($model=~/ *?(\w*) (.*?) *$/) {
 $enqmake=lc($1);
 $model=$2;
 print STDERR model split into '$enqmake' '$model'\n;
   }
 } # extract make

 This generates:

 enqmake='' model='Kia Venga'
 About to split 'Kia Venga'

Your RE is a bit odd - all that 'non-greedy *' -ness implies troubles.
 The first space star ? can be greedy, right? You want all the
spaces/white space in a row, or rather don't want - as you're anchored
on the end, this doesn't do anything for the actual RE work. The next
word char * means zero or more - you want at least one, right? Word
char or non-white space?  The only requirement your RE looks for is
the single blank between capture 1 and 2 - so
Kia\tVenga

won't work.  Actually anything w/o a blank will fail ... don't really
know enough about your data but try maybe:
 if ($model=~/(\S+)\s+(.*)\s*$/) {



-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-27 Thread Dr.Ruud

On 2012-07-27 16:58, Andy Bach wrote:


  if ($model=~/(\S+)\s+(.*)\s*$/) {


The \s* in the end does nothing.

Closer:
/(\S+)\s+(.*\S)/


Then play with this:

perl -Mstrict -we'
  my $data= $ARGV[0] ? q{Ford} : qq{ \t Fiat Ulysse 2.1 TD};
  printf qq{%s %s\n}, split( q{ }, $data, 2 ), q{oops};
  printf qq{%s %s\n}, $data =~ / (\S+) \s* ( (?: .* \S )? )/x;
' 1

Ford oops
Ford 

--
Ruud


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




SOLVED Re: Regex sending me mad

2012-07-27 Thread Gary Stainburn
On Friday 27 July 2012 15:58:07 Andy Bach wrote:

 Your RE is a bit odd - all that 'non-greedy *' -ness implies troubles.
  The first space star ? can be greedy, right? You want all the
 spaces/white space in a row, or rather don't want - as you're anchored
 on the end, this doesn't do anything for the actual RE work. The next
 word char * means zero or more - you want at least one, right? Word
 char or non-white space?  The only requirement your RE looks for is
 the single blank between capture 1 and 2 - so
 Kia\tVenga

 won't work.  Actually anything w/o a blank will fail ... don't really
 know enough about your data but try maybe:
  if ($model=~/(\S+)\s+(.*)\s*$/) {

Thanks Andy, Shawn and Jim.

The regex I'd supplied was built up over many attempts to get it working, 
hence the over the top spec.

The problem eventually turned out to be that the space between the make and 
model wasn't actually a space, i.e. wasn't ASCII 32. I have now got the 
people generating the data to generate it correctly and all is now fine, with 
a much simpler regex.

Gary


-- 
Gary Stainburn
Group I.T. Manager
Ringways Garages
http://www.ringways.co.uk 

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex sending me mad

2012-07-27 Thread Andy Bach
On Fri, Jul 27, 2012 at 10:22 AM, Dr.Ruud rvtol+use...@isolution.nl wrote:
 On 2012-07-27 16:58, Andy Bach wrote:

   if ($model=~/(\S+)\s+(.*)\s*$/) {


 The \s* in the end does nothing.

Well, I was thinking if it's a multi-word second match:
v6 Austin Martinspacespace

Then that would matches the rest of the phrase and trims trailing blanks.

 Closer:
 /(\S+)\s+(.*\S)/

Yeah, that's better - using the non-whitespace as anchors, so to speak!


-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex character classes: n OR m

2012-07-06 Thread Paul Johnson
On Fri, Jul 06, 2012 at 06:59:00PM +0100, Adam J. Gamble wrote:
 Dear All,
 
 I'm taking a (highly belated) first look at Perl today. From a background
 in Python, I'm coming to Perl, primarily out of curiosity with what it can
 do with regular expressions.

Welcome!

 To get to the point— is it possible to match a character class with a
 repeater that requires an exactly *n* OR *m* matches, rather than the
 traditional *{n, m}*. I've taken a look at
 http://perldoc.perl.org/perlrequick.html#Using-character-classes, which
 implies this wouldn't be possible? But, putting faith Perl's reputation for
 inherent quirkiness... if possible, I'd love to know what a solution would
 look like?

You're correct that there is no way to do this directly, but if you look
at the section just below (Matching this or that) you can see the basis
for a solution.

So, to match either three or five as for example, you could do this:

  /^(?:a{3}|a{5})$/

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex character classes: n OR m

2012-07-06 Thread Shawn H Corey
On Fri, 6 Jul 2012 18:59:00 +0100
Adam J. Gamble a.gam...@lucida.cc wrote:

 Dear All,
 
 I'm taking a (highly belated) first look at Perl today. From a
 background in Python, I'm coming to Perl, primarily out of curiosity
 with what it can do with regular expressions.
 
 To get to the point— is it possible to match a character class with a
 repeater that requires an exactly *n* OR *m* matches, rather than the
 traditional *{n, m}*. I've taken a look at
 http://perldoc.perl.org/perlrequick.html#Using-character-classes,
 which implies this wouldn't be possible? But, putting faith Perl's
 reputation for inherent quirkiness... if possible, I'd love to know
 what a solution would look like?

Try:  m{ (?: .{n} | .{m} ) }msx

Of course, replace the period with the character set you're looking for.


-- 
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

_Perl links_
official site   : http://www.perl.org/
beginners' help : http://learn.perl.org/faq/beginners.html
advance help: http://perlmonks.org/
documentation   : http://perldoc.perl.org/
news: http://perlsphere.net/
repository  : http://www.cpan.org/
blog: http://blogs.perl.org/
regional groups : http://www.pm.org/

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regex behavior in command line

2012-06-07 Thread Jon Forsyth
I overlooked the missing single quotes, Thanks!

-Jon


  1   2   3   4   5   6   7   8   9   10   >