On May 4, 2013, at 11:53 AM, Florian Huber wrote:

> On 05/04/2013 04:37 PM, Rob Dixon wrote:
>> On 04/05/2013 14:26, Florian Huber wrote: 
>>> 
>>> 
>> 
>> Hello Florian 
>> 
>> First a couple of points 
>> 
>> - Don't use named captures for simple regexes like this. They make the 
>> code harder to understand, and are really only useful when using complex 
>> patterns with multiple captures 
>> 
> 
> Well, to be honest, it's a bit more complicated than this - but I didn't want 
> to write too long an email. Anyway, it boils down to the following critical 
> piece in the logfile:
> 
> 
> # Move Polychroic Turret In To Place 
> SET_POLYCHROIC GFP/YFP
>  
> # Open image file 
> OPF 
> FILTERS YFP,YFP,100% 
> ACTSHUT 17 
> CCD 2.000000 
> WRT 
> FILTERS GFP,GFP,100% 
> ACTSHUT 17 
> AVG 4,1.000000 
> WRT 
> FILTERS POL,POL,100% 
> ACTSHUT 18 
> CCD 0.050000 
> WRT 
> FILTERS DAPI,GFP,100% 
> ACTSHUT 17 
> CCD 1.000000 
> WRT 
>  
> # Close image file(s)  
> CLF 
> BEEP
> 
> 
> So I actually need to capture the parameters from all the four filter sets 
> (this is a microscope logfile) - hence the named captures. Plus, the 
> parameters within one category may vary, so sometimes there will be an "AVG", 
> sometimes there won't - if I simply run a long regex over everything, I'm 
> afraid that one slight deviation will mess everything up. For example, I try 
> to capture (AVG\s\d) but then there is none - so if I do something like
> 
> my ($one, $AVG, $three) =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/;

That is not syntactically correct. You need to bind the string to be searched 
to the pattern and assign the captures:

  my ($one, $AVG, $three) = ( $text =~ m/(PATTERN1)(AVG\s\d)?(PATTERN3)/ );

or let the bind be done to the default variable ($_) and assign (not bind) the 
results):

  my ($one, $AVG, $three) = m/(PATTERN1)(AVG\s\d)?(PATTERN3)/;


> and one of the patterns is not there, still everything will be shifted to the 
> right, won't it? I figured that by naming the captures, I will always know if 
> there was a match at the very position intended.

If the regular expression does not match all three patterns, then none of the 
variables will be set.

>> The pattern match 
>> 
>>    $+{'GFP'} =~ m/(?<AVG>AVG\s\d); 
>> 
>> is in void context (i.e. the result is being discarded. That is mostly 
>> equivalent to scalar context as far as operator behaviour is concerned. 
>> And you have made things worse by writing 
>> 
>>     $+{'GFP'} =~ scalar m/(?<AVG>AVG\s\d)/; 
>> 
>> which is equivalent to 
>> 
>>     $+{'GFP'} =~ ($_ =~ /(?<AVG>AVG\s\d)/); 
>> 
>> so it applies the pattern to the $_ variable, and uses the resut of that 
>> match as another regex and applies that to $+{GFP}. 
>> 
> So what is the return value of the pattern match operator then? 0 if 
> successful, 1 if not? The number of matches?

In scalar context, the match operator returns 1 if the match was successful and 
0 if it was not. There is only one "match"; the string either matches the 
pattern or it does not. There can be multiple "captures". In list context, the 
captures are returned.


>> It would help to be able to see the format of your input data. If you 
>> know that reading the entire file in is a bad idea then you shouldn't be 
>> doing it. 
>> 
> 
> Well, the thing is, the logfiles are a few kb long - and reading everything 
> into one string does make some regexes easier, in my opinion. I tried first 
> with while loops but then I found it very difficult to discriminate between 
> recurring patterns and storing whatever is found in between. Maybe it's just 
> me because I'm using Perl only every now and then and there are cleverer ways 
> of doing that. In my case, I figured that the bit of time/RAM I lose is made 
> up by having easier regexes.
> 

Either way will work. Both have their challenges.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to