I had a need for doing this in a project of my own, so I have written a
method for my own code that will parse a range of inputs. I'm still
adding tests (and finding and fixing bugs) as of half an hour ago, not
calling this "done" in any ready-for-release sense.
The existence of strftime in DT::Incomplete that would format based on a
specifier, with fields not present in the object replaced by 'x's in the
output text, made the Incomplete format tempting enough to get me to use
it (I'm working on an archive of old photos, not just my own, where
sometimes we have only partial information about when a photo was taken).
My code is not an strptime-equivalent; it doesn't take specific formats
and then parse them precisely, it's for the human-interface situation
where the user has provided a date and we want to try to understand it.
That's what I needed, so that's what I wrote. As such it's somewhat
America-centric, since that's where I'm using it, that's where the
photos I'm working with are mostly from, that's where the people I'm
working with are from. But it doesn't guess aa/bb/yyyy dates; if you
want to use them you have to specify which format you mean, m/d/y or
d/m/y, when you instantiate the object. So for my purposes, yeah, I say
American date format, but it's not even the default, just an option. It
also supports timezone abbreviations, but from a list you can provide in
the constructor, not hard-coded to American ones (Yeah, I know there are
conflicting abbreviations in use in various parts of the world, so no
module supporting the world can depend on knowing what they mean;
however, people in America, if they specify time zones at all, do it as
"CDT", "EST", and so forth, and I needed to handle that.)
So....if there's any interest in having this available as part of
DT::Incomplete, I could possibly work on integrating it, and generate a
patch or pull-request or whatever people want to make that easy? But if
it's too far afield I don't want to deal with that learning curve (never
actually done a pull request before; lots of git at work and on my own
projects but we never used the pull-request mechanism, and my experience
contributing to free software consists of sending Henry Spencer a C-News
patch once long ago) and do that work and then have it crash and burn.
Better to talk first.
When did named capture groups appear in Perl? Was that 5.20, or
somewhere earlier? This requires them.
Here's the pod and code for my parse method (from my own
Config::Hierarchy::Type::DateTime class) so anybody who cares can have
at least a bit of an idea what I'm talking about. It uses two attributes
of my object that would have to be dealt with some other way integrating
with DT::Incomplete, either remove the features they control, or make
them parameters to the parse method or whatever it would be called I guess.
=head3 parse()
Given a string, parses it and returns the value for the type, a
reference to a DateTime::Incomplete object.
Arguments:
=over
=item string
The input data
=back
Returns: Value
Dies on error
slashDate controls what aa/bb/yyy means (nothing, or m/d/y, or
d/m/y. zoneAbbreviation defines time zone abbreviations we accept, so
that can be customized to local needs (because real people don't
mostly use an internationally unambiguous way to indicate time zones).
We accept dates, or dates and times. The time can be separated from
the date with whitespace, or with the letter "T" (I<ala> ISO8601).
The date can be in any of the following formats:
=over
=item iso8601 date
That is, 'YYYY-MM-DD'.
=item slashed date
(Unless slashDate is set to its default value of 'none').
If slashDate is 'monthday' then the format is 'mm/dd/yyyy'. This is
the American standard.
If slashDate is 'daymonth' then the format is 'dd/mm/yyy'. This is
common in most of the rest of the world.
=item TOPS-20 date
An unambiguous and pretty date format. It uses the short name of the
month rather than a number, and that avoids possible confusion between
day number and month number. the format is 'dd-mon-yyyy'.
=back
Dates can have 2-4 digit years. I don't do anything with them,
whatever happens in the DateTime::Incomplete constructor,
happens. (There are also no provisions for negative years or for
specifying an era, BC or AD.)
The time is represented as 'HH:MM', as a minimum.
It can be as complicated as 'HH:MM:SS.NNNNNN PM +500'.
If 'am' or pm is present after the time digits, then the hours cannot
exceed 12. If a timezone name or offset is present it is used,
otherwise a floating time is created.
=cut
# Regexps defined outside the function for performance.
# The regexps separate the chuks out but don't handle full validation,
# many things must be checked or even disambiguated later.
# Date and time fields are mostly digits. They can be either a group
# of digits, or a group of x's. The whold field must be x'ed.
our $digit = qr/\d|x/;
# In the format that allows month names, you can also x it out
our $monthName = qr/jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|xxx/i;
# In an ideal universe this would be locale-driven or something
our %monthConvert = (
jan => 1,
feb => 2,
mar => 3,
apr => 4,
may => 5,
jun => 6,
jul => 7,
aug => 8,
sep => 9,
oct => 10,
nov => 11,
dec => 12,
xxx => undef,
);
our %monthDays = (
1 => 31,
2 => 29, # One more thing not checked, leap year!
3 => 31,
4 => 30,
5 => 31,
6 => 30,
7 => 31,
8 => 31,
9 => 30,
10 => 31,
11 => 30,
12 => 31,
);
# Date can be in iso format, slashed format, or tops20 format. The
# slashed format is ambiguous between American and everywhere else
# usage for some dates, and which are interpreted how and which are
# allowed at all is a matter for software later.
our $iso = qr/(?<year>$digit{4})-(?<month>$digit{1,2})-(?<day>$digit{1,2})/i;
our $slashdate =
qr{(?<pair1>$digit{1,2})/(?<pair2>$digit{1,2})/(?<year>$digit{2,4})};
our $tops20 =
qr/(?<day>$digit{1,2})-(?<monthName>$monthName)-(?<year>$digit{4})/i;
# Not super-smart, counting on underlying constructor to do actual
# validation. Accepts timezone abbreviations of 2-5 characters, or utc
# or gmt offsets of plus or minus 2 digits maybe possibly followed by
# ".5". So this accept everything in the big list I know of, and
# shouldn't get confused with other elements of the time string.
#our $timezone = qr/(?<time_zone>(([-+]\d{4}(\.5)?)|[a-z]+\/[a-z]+))/i;
our $timezone =
qr/(?<zone_offset>[-+]$digit{4}(\.5)?)|(?<zone_abbr>[a-z]{2,5})|(?<zone_name>[a-z]+\/[a-z]+)/i;
# Handles time down to nanoseconds, with a timezone, can have am/pm.
our $time =
qr/(?<hour>$digit{1,2}):(?<minute>$digit{1,2})(:(?<second>$digit{1,2})(\.(?<nanosecond>$digit{1,9}))?)?\s*((?<ampm>am|pm))?(\s+($timezone))?/i;
sub parse {
my ($self, $input) = @_;
my $logger = Log::Log4perl->get_logger('config.hierarchy.type.datetime');
if ($input !~ qr/^\s*($iso|$slashdate|$tops20)((\s+|T)($time))?\s*$/) {
$logger->logcroak ("No parseable date in '$input'");
}
# All the goodies are in %+
my %fields = %+; # Where we can't accidentally step on them
# We'll manipulate the fields and then pass it as the
# DateTime::Incomplete constructor params.
# Deal with slashdate format
if ($self->slashDate eq 'none' &&
(exists($fields{pair1}) || exists($fields{pair2}))) {
$logger->logcroak("Slashed date format '$input' not accepted");
}
# The regexp "can't" find pair1 but not pair2, nor vice-versa,
# so the above is overkill? Similarly pair1 and pair2 can't exist
# at the same time as day and month.
if (exists($fields{pair1})) {
if ($self->slashDate eq 'monthday') {
$fields{month} = $fields{pair1};
$fields{day} = $fields{pair2};
} elsif ($self->slashDate eq 'daymonth') {
$fields{day} = $fields{pair1};
$fields{month} = $fields{pair2};
}
delete($fields{pair1});
delete($fields{pair2});
}
# Handle monthName if present
if (exists($fields{monthName})) {
$fields{month} = $monthConvert{fc($fields{monthName})};
delete($fields{monthName});
}
# Timezone offset or name needs to be called time_zone
if (exists($fields{zone_offset})) {
$fields{time_zone} = $fields{zone_offset};
delete($fields{zone_offset});
}
if (exists($fields{zone_name})) {
$fields{time_zone} = $fields{zone_name};
delete($fields{zone_name});
}
# Translate zone_abbr if present
if (exists($fields{zone_abbr})) {
if (!$self->has_zoneAbbreviation ||
!(exists($self->zoneAbbreviation->{fc($fields{zone_abbr})}))) {
$logger->logcroak ("Unknown time zone abbreviation in '$input'");
}
$fields{time_zone} = $self->zoneAbbreviation->{fc($fields{zone_abbr})};
delete($fields{zone_abbr});
}
# Handle am/pm
if (exists($fields{ampm}) && exists($fields{hour})) {
if ($fields{hour} > 12) {
$logger->logcroak("Hour cannot exceed 12 when 'am' or 'pm'
present");
}
if (fc($fields{ampm}) eq 'pm') {
$fields{hour} += 12;
}
}
# Handle nanoseconds
if (exists($fields{nanosecond})) {
$fields{nanosecond} .= ('0' x (9-length($fields{nanosecond})));
}
# Remove x'ed out fields (defaults to undef anyway)
# This can leave partially x'ed out fields. Don't do that; it hurts
# when you do that.
for (keys(%fields)) {
if ($fields{$_} =~ qr/^x+$/) {
delete ($fields{$_});
}
}
# DateTime::Incomplete constructor does not reject impossible days,
# months, etc., so check those here.
$logger->logcroak ("Month exceeds 12 in '$input'") if ($fields{month} // 0)
> 12;
$logger->logcroak ("Hour exceeds 23 in '$input'") if ($fields{hour} // 0) >
23;
$logger->logcroak ("Minute exceeds 59 in '$input'") if ($fields{minute} // 0)
> 59;
$logger->logcroak ("Second exceeds 59 in '$input'") if ($fields{second} // 0)
> 59;
$logger->logcroak ("Nanoseconds must be 9 digits in '$input") if
(length($fields{nanoseconds}) // 9) != 9;
if (exists($fields{day})) {
my $monthLimit = ($monthDays{$fields{month}}) // 31;
if ($fields{day} > $monthLimit) {
$logger->logcroak ("Day of month exceeds $monthLimit in '$input'");
}
}
local $Data::Dumper::Terse = 1; # don't output names where feasible
local $Data::Dumper::Indent = 0; # turn off all pretty print
$logger->trace("incomplete date params: ", Dumper(\%fields));
# Okay, close enough, blast it at the constructor and let that
# figure out whatever else may still be wrong with it.
return DateTime::Incomplete->new(%fields);
} # parse()
--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Words Over Windows http://WordsOverWindows.dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/