Re: Regular Expressions

Chas. Owens Sun, 08 Feb 2009 09:32:41 -0800

On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> Chas. Owens wrote:
>>
>> On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson <nore...@gunnar.cc>
>> wrote:
>>>
>>> TMTOWTDI
>>>
>>>   use Time::Local;
>>>   while (<DATA>) {
>>>       s{,(.+?),}{
>>>           my ($d, $m, $y) = split /\//, $1;
>>>           my $t = timelocal 0, 0, 0, $d, $m-1, $y;
>>>           ($d, $m, $y) = (localtime $t)[3..5];
>>>           sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
>>>       }e;
>>>   }
>>
>> snip
>>
>> And this would be the confusing, fragile mess I spoke of.
>
> Sorry, but I fail too see how using the s/// operator to extract the date
> field would be so much more confusing and fragile compared to split() +
> join().
snip


You are calling three functions (one of which is split) and assigning
returns three times inside the replacement.  Add on top of that the
fact that the regex only works for the second field.  Compare all of
that to calling two much simpler functions, a simple substitution, and
one assignment.  Try to imagine what happens six months from now when
you need to go back and perform a transformation on the fifth field.
Are you going to extend the regex to try to capture that value?  Or
are you just going to rewrite the code to use a split like you should
have in the first place?  Also, there may be a need to handle commas
in the fields at some point in the future.  This will entail using a
module like Text::CSV.  With the split code you can just replace the
split with the proper parsing function from the module.  With the
giant substitution code you pretty much have to rewrite the whole
thing.

I am all for using advanced features of Perl when it makes the code
clearer or more concise, but this code is longer than the split
version, involves more functions (including the confusing* localtime
and timelocal functions), and doesn't even do error checking on the
data.

On an unrelated topic, why are you using timelocal?  A much better
solution is to use the strftime function from the POSIX module:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

while (<DATA>) {
        s{,([^,]+),}{
                my ($m, $d, $y) = $1 =~ m<^([0-9]+)/([0-9]+)/([0-9]+)$>
                        or die "$. has an invalid date format";
                strftime ",%Y%m%d,", 0, 0, 0, $d, $m - 1, $y - 1900;
        }e;
        print;
}

__DATA__
1,1/1/2009,optional,foo
2,1/2/2009,,bar
3,1/3/2009,,baz


Note how the split from your code has been changed to a regex.  This
is because split is indiscriminate.  This was good in my code because
it acted as future proofing against more fields being added to the
record** (which is unlikely to affect the meaning of earlier fields),
but bad here because we know the expected format of the date and the
chances of it not being that format and the code still being correct
at some point in the future is small.

* localtime pretty much only makes sense when you know the C based tm
structure it came from and timelocal, besides being a word play that
is too clever by half, is worse because it violates that structure***.
** also, if we wanted to throw an error because there were too few or
too many fields it would be easily achieved by asking the array how
many elements it held.
*** http://perldoc.perl.org/Time/Local.html#Year-Value-Interpretation

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regular Expressions

Reply via email to