On Thu, Jul 3, 2008 at 7:04 PM, Dan Brickley <[EMAIL PROTECTED]> wrote: > Breton Slivka wrote: > >> I offer the challenge to those developers: If you sincerely believe >> that simple internationalized date parsing is an unsolvable or >> difficult problem (which, as I have pointed out has been solved >> numerous times already, with two examples), please present your >> evidence. Why is avoiding this work more important than Accessibility? >> Why is avoiding this work more important than avoiding hidden >> metadata?
> Imagine the English language permutations of "Tuesday the forteenth of July, > next year" in terms of word order. Then allow for all natural languages (in > all written scripts). And don't forget we use a variety of calendars. Big > job. In theory it could be attempted; but the culture around here is averse > to 'theoretical' solutions. > Once again this straw man is trotted out. Who is discussing this type of solution other than to specifically discredit the approach as too hard? I certainly am not suggesting this kind of wide ranging natural language parser. I haven't seen anyone else seriously suggesting it It's a foolish undertaking, and it's obviously a foolish undertaking. Then WHY OH WHY does this keep being brought up as though it were being seriously discussed? Where does this idea keep popping out from? Let me give an example in pseudocode of a parser that would work, and would be simple to write, and whose format could be read by a screen reader. function parser ( datestring, locale ) { en-months = [January, February, March, April, May, June, July, August, September, October, November, December] if locale === "en-us" dateparse[month, day, year] = regex(datestring, "([A-Za-z]+) ([1-3]?[0-9])s|n|r|tt|d|h, ([0-9]{1, 4})); if locale === "en-au" dateparse[day, month, year] = regex(datestring, "([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale === "en-uk" dateparse[day, month, year] = regex(datestring, "([1-3]?[0-9])s|n|r|tt|d|h ([A-Za-z]+), ([0-9]{1, 4})); if locale.contains("en") dateparse.month = en-months.indexOf(dateparse.month); return dateparse AS [year, month, day]; } This is a simple example. There are likely better techniques for doing this than regexes, (or not) but the point is, that you can make a human READABLE format without having to cover the whole spectrum of human expression. Instead, you have ONE precise format for US dates, ONE precise format for UK dates, ONE precise format for japanese dates, etc, etc. You stick this format of date in the title of an ABBR, and you can say whatever you want about the date in whatever language you like in the contents of the ABBR. The parser shouldn't care about the contents. IT's just looking at the title. IT already is. The only change from the current pattern is that we'd be using a less geeky and obscure format than ISO-8601. The lang attribute of the ABBR element provides the format in use. Honestly how difficult is it for a parser author to collect one format for each locale? I've seen far more heroic efforts on simpler things. How difficult is it for content publishers to learn ONE format? (The one for their own locale) ? How difficult is it to ask content authors to learn a format like this? We're already asking them to learn a more difficult format! Yes it's more complicated than parsing ISO 8601. But it's not boiling the ocean. This isn't a binary decision we're facing. It's not a choice between "I could implement it in an hour" level of simplicity and "Human level" AI. Comprimise has to be made if we are to make any progress. _______________________________________________ microformats-discuss mailing list microformats-discuss@microformats.org http://microformats.org/mailman/listinfo/microformats-discuss