Sorry this is such a dense post, but this module spawned a lot of discussion and deciding what a simple format turned out to be not so simple. Anyway, if you want to play with this, make sure you look at the note about regenerating the DT::Locale data.
If people are okay with the general direction of the code then I will commit it to CVS (I would like to get some consesus on the name, but will start a separate thread for that). Available from: http://www.limey.net/~fiji/perl/DateTime-Format-Simple-0.01.tar.gz http://www.limey.net/~fiji/perl/generate_from_icu (see Notes) Notes: - I use the locale to determine the meaning of \d\d-\d\d-\d\d. It can be ymd, dmy, or mdy. In order to make this work the DT::Locale tools/generate_icu needed a small change. Get the script above and re-run the generator. If this change meets with approval I will commit it. - If the length of the year is <= 2 then I will use the base_year argument (defaults to the current year) to work out the appropriate century: my $base_century = int( $base_year / 100 ) * 100; $year += $base_century; $year -= 100 if $year - $base_year > 50; Major ommissions: - No POD - AM/PM & BC/AD are not localized - BC/AD is not supported at all yet (neither are negative years) - Only tests English and French(?) parsing - Needs many more tests and in more languages (I actually need fluent speakers to tell me if the formats it parses are reasonable) Interface: - Only the DT::F::Simple->parse_datetime( ... ) at the moment, I will add a new() and the ability to call through the returned object - The ... can either be a single argument giving the string to parse OR name => value pairs: string: The string to parse locale: The locale to parse in (assumes root which is en_US) time_zone: The default TZ of the returned DT object (if no TZ is specified in the string), defaults to 'floating' base_year: The year to use if the string does not specify one, defaults to the current year, also gives the point to use when inferring a complete year from a 2 digit one... debug: If true then it will print lots of info, defaults to false Formats it should be able to parse: - ISO8601 date (only in the format YYYY-MM-DDTHH:MM:SS.FFFF). The T separator is optional and may be replaced by 0+ spaces. The time may be ommitted from the rightmost part to the left (all of the date must be given). The separators inside the date and time (i.e. the -s and :s are also optional) - HH:MM:SS.FFFFFFF AM/PM is parsed, again the highest precision parts may be ommitted. AM/PM is optional, if not present it assumes 24 hour time. You may specify HH AM/PM, but in this case the AM/PM is required. - Dates of the form Y+/M+/D+ with -, . or / as the separator. Also accepted are M/D/Y or D/M/Y depending on the locale. (Y/M/D is assumed of the first number is longer than 2 digits or the locale explicitly calls for YMD) - DD-MonthName-Y+, where Month name is the locale appropriate string - DD MonthName or MonthName DD with a year somewhere else in the string - If there is a locale appropriate day name somewhere in the string it is used to validate the parsed date. - Timezones are supported either as GMT offsets: GMT+5:00, +5:00 Only supports offsets from GMT or UTC, the : is optional, and you may omit minutes, you may also provide seconds, but they must always be 00. - Named time zones are fine, it will check against the current set of DT::TZ names (at runtime, so aliases are honored) - You may use both offsets and parenthesized named TZs, so '-0600 (CST)' will work (assuming an alias for CST). - It will ignore accents in languages when parsing the strings. Please help: - I am desperate for speakers of other languages to provide me with good test strings! - If you want to mail me with more English test strings that would be great - Suggestions of additional formats to parse would be greatly appreciated Thanks for bearing with me! -ben
