Back to the subject of how to handle +-HH:MM, I think the only really viable candidates are %z and %:z, so I think the question boils down to whether, with strptime, we care more about consistency with GNU / glibc's strptime (which apparently do implement %z to cover both HHMM and HH:MM) or whether we care more about users being able to specific *exactly* the string they want to match (e.g. allowing users to specify that a colon found in a time zone offset is an error condition).
I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable. Best, Paul On 10/21/2017 09:07 AM, Mario Corchero wrote: > Sorry, hit send by mistake on the previous message. > > That is fine for parsing, but my issue with this is symmetry with strftime. > > > I can agree with having a %:z for support in strftime but I think that is a > separate change. The issue I opened with the attached PR focused only in > strptime to facilitate the discussion. > > Again, what is the alternative? > > > Making %z accept time-offset rfc3339 compatible. > > I have a working strptime: > > > Ouch, except for the fractionals seconds (which was not part of the issue > raised) I had also a patch for the colon and another for supporting 'Z' as > reported in the bug tracker. I was mentioning working with Paul in the > implementation of isoparse, as even if it might look simple it has caused > many long-standing discussions in the past. > > On 21 October 2017 at 13:55, Mario Corchero <[email protected]> wrote: > >> >> >> On 21 October 2017 at 13:18, Oren Tirosh <[email protected]> wrote: >> >>> >>> On Sat, 21 Oct 2017 at 13:24, Mario Corchero <[email protected]> wrote: >>> >>>> My opinion (as a user, I have no authority here whatsoever) >>>> >>>> *1) About parsing colons in offsets with strptime* >>>> >>>> I think having %z support both +-HH:MM and +-HHMM would be the best >>>> choice, as it seems the simplest for me as a user. >>>> I'd go even further, making %z support ':' and 'Z', *a la glibc*. >>>> This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh >>>> >>> >>> That is fine for parsing, but my issue with this is symmetry with >>> strftime. If the same extensions are also implemented for formatting (I >>> have a prototype) then you need some way to specify whether you want a : >>> separator or not. The %z will have to remain without colon on formatting >>> for backward compatibility. >>> >>> So l agree that the parser can be safely made more liberal in what it >>> accepts, but the formatter must be strict and specific in what it produces. >>> >>> I think this gives the best experience to the strptime user. It >>>> basically makes the time-offset rfc3339 >>>> <https://tools.ietf.org/html/rfc3339> compatible. >>>> >>> >>> Yes, that's the goal. >>> >>> *2) Adding a handy function to build a datetime from a string serialized >>>> with isoformat* >>>> Absolutely agree on having an isoparse. That would be amazing, we can >>>> even build it on top of 1). >>>> >>> >>> ...and building it on top of 1 requires several extensions and variants. >>> People here seem to be a bit taken aback by the scope of these extensions. >>> I understand this reaction, but I maintain that most or all this complexity >>> is necessary if you want to implement this on to of strptime rather than a >>> custom isoparse(). >>> >>> *Side note:* >>>> I am not totally in favour with "%?:z" (probably because I am leaning >>>> on %z doing the parsing for both and ?z will have no place on strftime). >>>> I think this starts to add way too much complexity to just say "parse a >>>> time-offset". >>>> >>> >>> Again, what is the alternative? If you want a parser that accepts the >>> output of isoformat() for all possible datetime values (except custom >>> tzinfo) then it needs to support a missing tz offset as indicating a naive >>> timestamp. >>> >>> You can say that the real source of the asymmetry here is not with my >>> proposal but rather in the underlying strftime/strptime: on formatting, %z >>> yields an empty string for a naive timestamp rather that producing an >>> error. But on parsing, it refuses to parse a timestamp with no offset. A >>> truly symmetric implementation would have accepted it as an naive >>> timestamp. >>> >>> Too late for %z because it must remain backward compatible, but perhaps >>> %:z can be made to accept a missing offset as a naive timestamp. The user >>> can then check for naive timestamp and reject them if they are unacceptable >>> in that context, rather than specifying whether a missing timestamp is >>> acceptable or not in the format string. I have no problem with either >>> solution. >>> >>>> >>>> *Implementation:* >>>> I am happy to work with PaulG in the isoparse implementation if we >>>> decide to go with it and if he wants to get involved :) >>>> >>> >>> I have a working strptime: >>> https://github.com/orent/cpython/tree/strptime_extensions >>> >>> isoparse() on top of this strptime is a trivial one-liner. >>> >>> Oren >>> >>>> >>>> >>>> *Thanks:* >>>> Thanks for dedicating time to this, I think that even if minor this >>>> would be a killer addition to 3.7 if we manage to get it through. >>>> >>>> On 21 October 2017 at 07:34, Oren Tirosh <[email protected]> wrote: >>>> >>>>> ok, let's try to separate the issues and choices on each one: >>>>> >>>>> 1. Extending strptime to support time zone offset with : separator: >>>>> Should a single directive accepts either hhmm or by:mm or use two >>>>> separate directives? >>>>> >>>>> 2. Round tripping of isoformat() back to datetime value: >>>>> Implement custom isoparse() function or extend strptime so isoparse >>>>> simply calls strptime with a default format? >>>>> Support all variations produced by isoformat or just a subset? >>>>> (Variations include with/without fraction, with/without tz and separator >>>>> choice) >>>>> >>>>> I suggest 1 separate directives 2a extend strptime and 2b support all >>>>> variations. Do you have different preferences on any of these questions? >>>>> >>>>> I understand that the number of extensions to support this seems >>>>> excessive to you. >>>>> >>>>> Technically, my proposed "%.f" is not really necessary. I added it for >>>>> completeness. We can keep using ".%f" for non-optional fraction and define >>>>> "%?f" to implicitly include the dot. >>>>> >>>>> The distinction between "%z", "%:z" and "%?:z"" can also be narrowed >>>>> down. This can be done, for example, by making "%z" and "%?s" always >>>>> accept >>>>> hhmm with or without the : separator. >>>>> >>>>> On Fri, 20 Oct 2017 at 17:16, Paul G <[email protected]> wrote: >>>>> >>>>>> I think this would be a much bigger change to the strptime interface >>>>>> than is actually warranted, and probably would add in additional, >>>>>> unnecessary complexity by introducing the concept of optional matches. >>>>>> Adding the capability to match HH:MM offsets is a reasonable extension >>>>>> partially because that is a standard representation that is currently >>>>>> *not* >>>>>> covered by strptime, and the fact that that's how isoformat() represents >>>>>> the offset just makes this lack all the more acute. >>>>>> >>>>>> I think it should be uncontroversial to add *one* of these two %z >>>>>> extensions to Python 3 without getting bogged down in allowing a single >>>>>> strptime string to match any output from `.isoformat`. >>>>>> >>>>>> That said, I'm also very much in favor of a `.isoparse` or >>>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which >>>>>> should solve the issue without sweeping changes to how `strptime` works. >>>>>> >>>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote: >>>>>>> https://github.com/orent/cpython/tree/strptime_extensions >>>>>>> >>>>>>> %:z - matches +HH:MM >>>>>>> %?:z - optional %:z >>>>>>> %.f - equivalent to .%f >>>>>>> %?.f - optional %.f >>>>>>> %?t - matches ' ' or 'T' >>>>>>> >>>>>>> What they all have in common is that together they make it possible >>>>>> to >>>>>>> write a strptime format that matches all possible output variations >>>>>> of >>>>>>> datetime.__str__/ datetime.isoformat. >>>>>>> >>>>>>> The time zone not only supports the : separator but also allows >>>>>> making the >>>>>>> entire component optional, as isoformat() will add it only for aware >>>>>>> datetime objects. The seconds fraction is dropped from the default >>>>>> string >>>>>>> representation if the datetime represents a whole second. Since it is >>>>>>> dropped along with the decimal dot, I first made "%.f" that includes >>>>>> the >>>>>>> dot and then created the optional variant. Finally, "%?t" can be >>>>>> used to >>>>>>> accept a timestamp with either of the separators defined in iso8601. >>>>>>> >>>>>>> It is quite absurd that datetime cannot parse its own string >>>>>>> representation. Using these extensions an .isoparse() method may be >>>>>> added >>>>>>> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >>>>>>> round-tripping of all possible datetime values that do not not use a >>>>>> custom >>>>>>> tzinfo. >>>>>>> >>>>>>> Oren >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, 19 Oct 2017 at 17:06, Paul G <[email protected]> wrote: >>>>>>>> >>>>>>>> There is a new issue about the %z directive in strptime on the issue >>>>>>> tracker: https://bugs.python.org/issue31800 (linked to a few related >>>>>>> issues), and a linked PR expanding the definition of %z to match >>>>>> HH:MM: >>>>>>> https://github.com/python/cpython/pull/4015 >>>>>>>> >>>>>>>> I think either adding a %:z directive or expanding the definition >>>>>> of %z >>>>>>> would be pretty important, and I think there's a good case to be >>>>>> made for >>>>>>> either one. To summarize the arguments for people on the mailing >>>>>> list: >>>>>>>> >>>>>>>> The argument for expanding the definition of %z that I find >>>>>> strongest is >>>>>>> that according to the linux man pages ( >>>>>>> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z >>>>>> generates >>>>>>> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO >>>>>> 8601 >>>>>>> standard timezone specification",and ISO 8601 uses +-HH:MM, so if >>>>>> we're >>>>>>> following those linux pages, we should be accepting the version with >>>>>> the >>>>>>> colon. >>>>>>>> >>>>>>>> The argument that I find most compelling for adding a %:z directive >>>>>> are: >>>>>>>> >>>>>>>> 1. maintains the symmetry between strftime and strptime >>>>>>>> 2. allows users to be stricter about their datetime format >>>>>>>> 3. has precedent in that GNU's `date` command accepts %z, %:z >>>>>> and >>>>>>> %::z formats >>>>>>>> >>>>>>>> Can we establish some consensus on which should be done so that it >>>>>> can be >>>>>>> implemented? >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Paul >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Datetime-SIG mailing list >>>>>>>> [email protected] >>>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>>> https://www.python.org/psf/codeofconduct/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Datetime-SIG mailing list >>>>>>> [email protected] >>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>> https://www.python.org/psf/codeofconduct/ >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Datetime-SIG mailing list >>>>>> [email protected] >>>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>>> The PSF Code of Conduct applies to this mailing list: >>>>>> https://www.python.org/psf/codeofconduct/ >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Datetime-SIG mailing list >>>>> [email protected] >>>>> https://mail.python.org/mailman/listinfo/datetime-sig >>>>> The PSF Code of Conduct applies to this mailing list: >>>>> https://www.python.org/psf/codeofconduct/ >>>>> >>>>> >>>> >> > > > > _______________________________________________ > Datetime-SIG mailing list > [email protected] > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Datetime-SIG mailing list [email protected] https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
