On Sun, Feb 04, 2018 at 02:27:00PM +0100, Nicolas George wrote:
Michael Stone (2018-02-04):
But a better parser would allow the same functionality, without being
confusing, inconsistent, and hard to maintain. So yes, I'll stand by
Can you describe what you mean by "better parser" in more details?
Beware that the "same functionality" includes "same convenience".
Convenience is hard to achieve.
Well, it's not particularly convenient for people to have to constantly
wonder why the parser isn't doing what they think it should do. I've
been getting the questions and bug reports for 20 years, so trust me
when I say that people have trouble predicting the output of a given
As far as "better parser" that means something that requires the input
to be fully specified, and does not try to guess based on natural
language parsing. For example, what does "last month" mean? What does it
mean when you're on the 31st and the previous month didn't have a 31st?
What date is 1/2? What time zone is "EST"? Making guesses seems
"convenient" but when you hit corner cases and things break horribly,
that's not convenient after all. Most date parsers address this by
requiring a format specifier along with the input, so you can say
something like "parse '1/2' assuming the input is
numericday/numericmonth". Is it less "convenient" to have to specify the
format? Maybe, but it's also a heck of a lot more reliable. Someone else
pointed out postgresql's date parser, which lets you do things like
specify a date and then add something like "interval '1 day'".
Specifying the fact that a particular string is an interval makes the
parsing much more regular than trying to pull the interval out of
natural language. At one point date would appear to properly parse
ISO8601 input (YYYY-mm-ddTHH:MM:SS) but it interpreted the "T" as a
timezone specifier instead of the ISO8601 delimiter. (Compare output
with YYYY-mm-ddUHH:MM:SS or YYYY-mm-ddSHH:MM:SS.) Why would it ever have
been "convenient" to put a alphabet character timezone specifier after
the date and before the time? Who knows, but the natural language parser
was doing its best to guess a meaning for the input. That particular
issue was fixed, but how you can tell whether you're using a version
that works the old way or the new way? (Answer: you can't easily do so.
If you had to specify a format it would be easier to hard fail if trying
to use a format that wasn't understood rather than soft fail and produce
random output.) Is it "convenient" that there's a natural language
parser that only understands english? Maybe, if you speak english?