[Stuart Bishop <[email protected]>] > Sorry I'm late. pytz author here.
Hi, Stuart! Nice to see you. Stay a while :-) > Gosh you guys write a lot. I've tried to skim things, and will default > to agreeing with Tim since it is usually the smart thing to do. Excellent judgment. Although agreeing with Guido is mandatory, and he's wrong about some things here ;-) > A few notes from my skimming: > > - I want a boolean added to datetime instances, even if I don't like > the name, because I can then deprecate pytz and its confusing API and > implementation. I'm happy to work on Python implementation and > documentation. It will save me time and effort in the long run. Later you seem to say you'd prefer a 3-state flag instead, so not sure you really mean "boolean" here. > - Most of my thoughts got encoded in PEP-431. This would give us a > datetime module that operates exactly the way it does today, No. While 431 was highly obscure on this point, it turned out that Lennart was determined to change arithmetic behavior. That can't fly, for backward compatibility, and because even "aware" datetimes were intended to use a "naive time" model internally. Specifically, if you add timedelta(days=1) to a datetime today, you get "same time tomorrow" (day goes up by 1, but hour, minute, second and microsecond remain the same) in all cases. Even if a DST transition (or base-offset change, or leap-second change) occurred. That's now called "classic" arithmetic. The default behavior can't be changed. What you seem to have in mind (accounting for two of the three known reasons for why a local clock may jump: DST and base-offset changes, but not leap second changes) is now called "timeline" (sometimes "strict") arithmetic. According to Lennart, under PEP 431 timeline arithmetic would always be used. Under PEP 495, nothing about arithmetic changes. 495 is less ambitious, only intending to supply the bit(s) needed to _allow_ timeline arithmetic to be implemented as an option later. PEP 500 is about supplying different arithmetics, but Guido hates PEP 500. In the end, I expect timezone wrappers will supply factory functions, either separate functions for "give me such-and-such a timezone using classic arithmetic" and "give me such-and-such a timezone using timeline arithmetic", or a single function specifying the desired timezone and an optional flag to specify the arithmetic desired. > but with the option of performing pytz style unambiguous datetime > arithmetic There was nothing optional about it in 431, 495 doesn't address arithmetic, except to make it _possible_ to implement timeline arithmetic. > without pytz and its confusing API. > If the developer explicity set the is_dst flag, then exceptions would > be raised when trying to instantiate an ambiguous or invalid timestamp. > For code that does not specify the new, optional flag things work as > they do today and a best guess made when the localized datetime is > constructed. It's possible that 495 should do more in this direction. For now, it specifies enough that someone who cares can easily write a function to distinguish among "ambiguous time (in a fold)", "invalid time" (in a gap), and "happy time" ;-) , and do whatever _they_ want (ignore some subset, raise an exception, print a warning, supply a default, prompt the user for more info, ...). > - PEP-495 seems similar to PEP-431, See above. 431 was about arithmetic, although it didn't say so clearly. 495 is _only_ about adding a flag. > except that it attempts to allow things continue in the face of > an ambiguous or invalid localized datetime. > > The boolean flag is not tristate, so there is no way to have > strict checking of input. It doesn't matter if the developer said > 'whatever' and left the flag on the default, or cared enough to > explicitly override it. As above, it's possible 495 should do more. But it's hard to know when to stop. For example, there are many ways of specifying a datetime, including. e.g., using .combine() to paste a date and time together. It's generally impossible to make a fold/gap determination on a time alone - that's only possible in combination with a date. So does .combine() also need to whine? It's simpler overall to leave it to those users who care to check when they do care. > - The rules in PEP-495 for utcoffset() and dst() to deal with > ambiguous times only work in simple cases, as there dst offsets both > more and less than 1 hour, and there is no stdoffset since the offset > can change at the same time (eg. Europe/Vilnius 1941, where the clocks > ended up going backwards for summer time instead of forwards). 495 couldn't care less what causes folds and gaps - it's equally applicable to all causes, and whether in isolation or combination. What it _does_ assume is that a single bit suffices to resolve ambiguities: that there is no case in which more than two UTC times have the same spelling on a local clock. The goal of the PEP is to supply that bit. The burden is on the tzinfo supplier to set and use it correctly. The burden is also on the tzinfo supplier to supply a .utcoffset() "that works" to convert a local time to UTC, to supply a .dst() that returns whatever the tzinfo supplier thinks it should return, and to supply a .fromutc() that sets the bit correctly. The default .fromutc() is indeed too weak to handle anything except zones subject to nothing fancier than DST transitions alternating between "zero" and "non-zero", and that's not changing either. Neither will the default .fromutc() be changed to set first/fold/later/is_dst - only a tzinfo implementer has enough info about how the timezone works to set the bit correctly and semi-efficiently in all cases (the default .fromutc() can only ask what the total UTC, and DST, offsets are at specific microseconds in local time - it has no knowledge deeper than that, because those are the only questions the tzinfo interface _can_ be asked). As to "more and less than 1 hour", yes, the PEP hasn't been updated to clarify that "hour" _means_ "some number of microseconds" ;-) > - Other APIs I know of, including Python's time module, uses is_dst or > isdst as the required boolean flag. As do the timezone databases > containing the data we need. I think the argument against the is_dst > flag name in PEP-495 is flaccid. is_dst makes no sense for base-offset or leap-second transitions either; "first"/"fold"/"later" make equally clear sense for all causes of folds. But Guido hates leap seconds, seemingly intending to make it impossible for anyone to support them directly (via overloading datetime arithmetic operators), and so the case against "is_dst" is weaker now. > - If there is an argument in favour of 'first' over 'is_dst', it is > because occasionally there are timezone changes without a dst > transition. If we call it is_dst, we agree that in a few rare > historical cases we are going to have to lie. There are only two tzinfo authors in the world ;-) (you and Gustavo), and by all evidence you're both way more than bright enough to adapt to any spelling ;-) > - My argument in favour of 'is_dst' over 'first' is that this is what > we have in the data we are trying to load. You commonly have .> a timestamp with a timezone abbreviation and/or offset. This can > easily be converted to an is_dst flag. You mean by using platform C library functions (albeit perhaps wrapped by Python)? > To convert it to a 'first' flag, we need to first parse the datetime, I'm unclear on this. To get a datetime _at all_ the timestamp has to be converted to calendar notation (year, month, ...). Which is what I'm guessing "parse" means here. That much has to be done in any case. > determine the transition points that year, and then which side of > the nearest transition point it lies. Note that there can be more > than 2 transition points in a year, and no api has been discussed for > discovering them. Python doesn't need such an API. It needs the tzinfo author to implement .utcoffset(), .dst(), and .fromutc() according to whatever rules a timezone requires. Python code would convert the timestamp to UTC calendar notation first, then use .astimezone() to convert to whatever "timezone abbreviation and/or offset" was specified. astimezone() in turn gets everything it needs from the tzinfo's .fromutc(). I'm unclear anyway on why you'd trust an external is_dst flag to be reliable in the funky cases where, e.g., base-offset and DST transitions coincide. You either think it's important to handle such cases or you don't. If you do, what do _you_ think tm_isdst means in such cases? If you're relying on external code to compute is_dst for you, then it doesn't matter what anyone in the Python world thinks it should mean. It only matters what the universe of C library authors thought it should mean, assuming they were even aware of such cases. The relevant standards are no help at all in such edge cases. The web is filled with complaints about puzzling tm_isdst behavior in edge cases, and no two implementations seem to agree on what -1 "really means" even in seemingly straightforward cases. I'd rather that Python tzinfo authors implement exactly what _they_ think a timezone's rules really are - which indeed requires analyzing a time using all the timezone's internal rules. > - I think datetime should consider 1 day == 24 hours and not have > concepts like years or months, just like it does today. As others > suggested, a separate module dealing with leap years and variable > length days may be useful to some people, as would leapsecond support > for astronomers and astrologers. But if the default implementation > gives different results to all the other tools on your system, people > will think the default is wrong. Not sure what you mean here without specific examples of what you have in mind. But, as above, classic arithmetic will remain the default regardless - it's a dozen years too late to change that, even if everyone wanted to (and - surprise - everyone doesn't ;-) ). > - Offsets should ideally be declared in seconds. Last I looked, the > current Python implementation rounds them to the nearest minute and it > would be nice to fix that. These are almost always historical, dating > from when noon was when the sun was at its highest point above the > capital (eg. Europe/Amsterdam before 1938) Offsets are currently required to be a multiple of a minute (no rounding is done - an exception is raised if an offset is not a multiple of a minute, with magnitude less than 24*60 (the number of minutes in a day)). That should change, and Alexander has already done most of the work for it, but it's not in the scope of this PEP. "The flag" can be added with or without that change. > - There are cases where there are gaps at the end of DST, and folds at > the beginning of DST, when the timezone offsets were changed > simultaneously with the dst flag. That's fine, provided again that a single bit suffices to resolve ambiguous times on the local clock. A fold is a fold and a gap is a gap, regardless of cause. It's only if we, e.g., _name_ the flag "is_dst" that someone is likely to erroneously assume that the flag always _means_ "and so there's a fold when it changes from True to False, and a gap when it changes from False to True". > - Microsoft's timezone database does not contain historical > information, which is why databases that need support under Windows > like PostgreSQL include the IANA/Olson database. > > - Thank you to everyone who has been working on this. I've wanted it > for a long, long time but never got around to remembering how to write > C. Au contraire - thank _you_ for pytz! That was such an heroic effort to overcome the lack of a bit that it's legendary :-) We'll get this all to work cleanly in the end. _______________________________________________ Datetime-SIG mailing list [email protected] https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
