On May 4, 2013, at 11:53 AM, Florian Huber wrote: > On 05/04/2013 04:37 PM, Rob Dixon wrote: >> On 04/05/2013 14:26, Florian Huber wrote: >>> >>> >> >> Hello Florian >> >> First a couple of points >> >> - Don't use named captures for simple regexes like this. They make the >> code harder to understand, and are really only useful when using complex >> patterns with multiple captures >> > > Well, to be honest, it's a bit more complicated than this - but I didn't want > to write too long an email. Anyway, it boils down to the following critical > piece in the logfile: > > > # Move Polychroic Turret In To Place > SET_POLYCHROIC GFP/YFP > > # Open image file > OPF > FILTERS YFP,YFP,100% > ACTSHUT 17 > CCD 2.000000 > WRT > FILTERS GFP,GFP,100% > ACTSHUT 17 > AVG 4,1.000000 > WRT > FILTERS POL,POL,100% > ACTSHUT 18 > CCD 0.050000 > WRT > FILTERS DAPI,GFP,100% > ACTSHUT 17 > CCD 1.000000 > WRT > > # Close image file(s) > CLF > BEEP > > > So I actually need to capture the parameters from all the four filter sets > (this is a microscope logfile) - hence the named captures. Plus, the > parameters within one category may vary, so sometimes there will be an "AVG", > sometimes there won't - if I simply run a long regex over everything, I'm > afraid that one slight deviation will mess everything up. For example, I try > to capture (AVG\s\d) but then there is none - so if I do something like > > my ($one, $AVG, $three) =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/;
That is not syntactically correct. You need to bind the string to be searched to the pattern and assign the captures: my ($one, $AVG, $three) = ( $text =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/ ); or let the bind be done to the default variable ($_) and assign (not bind) the results): my ($one, $AVG, $three) = m/(PATTERN1)(AVG\s\d)?(PATTERN3)/; > and one of the patterns is not there, still everything will be shifted to the > right, won't it? I figured that by naming the captures, I will always know if > there was a match at the very position intended. If the regular expression does not match all three patterns, then none of the variables will be set. >> The pattern match >> >> $+{'GFP'} =~ m/(?<AVG>AVG\s\d); >> >> is in void context (i.e. the result is being discarded. That is mostly >> equivalent to scalar context as far as operator behaviour is concerned. >> And you have made things worse by writing >> >> $+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/; >> >> which is equivalent to >> >> $+{'GFP'} =~ ($_ =~ /(?<AVG>AVG\s\d)/); >> >> so it applies the pattern to the $_ variable, and uses the resut of that >> match as another regex and applies that to $+{GFP}. >> > So what is the return value of the pattern match operator then? 0 if > successful, 1 if not? The number of matches? In scalar context, the match operator returns 1 if the match was successful and 0 if it was not. There is only one "match"; the string either matches the pattern or it does not. There can be multiple "captures". In list context, the captures are returned. >> It would help to be able to see the format of your input data. If you >> know that reading the entire file in is a bad idea then you shouldn't be >> doing it. >> > > Well, the thing is, the logfiles are a few kb long - and reading everything > into one string does make some regexes easier, in my opinion. I tried first > with while loops but then I found it very difficult to discriminate between > recurring patterns and storing whatever is found in between. Maybe it's just > me because I'm using Perl only every now and then and there are cleverer ways > of doing that. In my case, I figured that the bit of time/RAM I lose is made > up by having easier regexes. > Either way will work. Both have their challenges. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/