Hi Tim, (tl;dr I think your latest proposal re PEP 495 is great.)
I think we're still mis-communicating somewhat. Before replying point by point, let me just try to explain what I'm saying as clearly as I can. Please tell me precisely where we part ways in this analysis. Consider two models for the meaning of a "timezone-aware datetime object". Let's just call them Model A and Model B: In Model A, an aware datetime (in any timezone) is nothing more than an alternate (somewhat complexified for human use) spelling of a Unix timestamp, much like a timedelta is just a complexified spelling of some number of microseconds. In this model, there's a bijection between aware datetimes in any two timezones. (This model requires the PEP 495 flag, or some equivalent. Technically, this model _could_ be implemented by simply storing a Unix timestamp and a timezone name, and doing all date/time calculations at display time.) In this model, "Nov 2 2014 1:30am US/Eastern fold=1" and "Nov 2 2014 6:30am UTC" are just alternate spellings of the _same_ underlying timestamp. Characteristics of Model A: * There's no issue with comparisons or arithmetic involving datetimes in different timezones; they're all just Unix timestamps under the hood anyway, so ordering and arithmetic is always obvious and consistent: it's always equivalent to simple integer arithmetic with Unix timestamps. * Conversions between timezones are always unambiguous and lossless: they're just alternate spellings of the same integer, after all. * In this model, timeline arithmetic everywhere is the only option. Every non-UTC aware datetime is just an alternate spelling of an equivalent UTC datetime / Unix timestamp, so in a certain sense you're always doing "arithmetic in UTC" (or "arithmetic with Unix timestamps"), but you can spell it in whichever timezone you like. In this model, there's very little reason to consider arithmetic in non-UTC timezones problematic; it's always consistent and predictable and gives exactly the same results as converting to UTC first. For sizable systems it may still be good practice to do everything internally in UTC and convert at the edges, but the reasons are not strong; mostly just avoiding interoperability issues with databases or other systems that don't implement the same model, or have poor timezone handling. * In this model, "classic" arithmetic doesn't even rise to the level of "attractive nuisance," it's simply "wrong arithmetic," because you get different results if working with the "same time" represented in different timezones, which violates the core axiom of the model; it's no longer simply arithmetic with Unix timestamps. I don't believe there's anything wrong with Model A. It's not the right model for _all_ tasks, but it's simple, easy to understand, fully consistent, and useful for many tasks. On the whole, it's still the model I find most intuitive and would prefer for most of the timezone code I personally write (and it's the one I actually use today in practice, because it's the model of pytz). Now Model B. In Model B, an "aware datetime" is a "clock face" or "naive" datetime with an annotation of which timezone it's in. A non-UTC aware datetime in model B doesn't inherently know what POSIX timestamp it corresponds to; that depends on concepts that are outside of its naive model of local time, in which time never jumps or goes backwards. Model B is what Guido was describing in his email about an aware datetime in 2020: he wants an aware datetime to mean "the calendar says June 3, the clock face says noon, and I'm located in US/Eastern" and nothing more. Characteristics of Model B: * Naive (or "classic", or "move the clock hands") arithmetic is the only kind that makes sense under Model B. * As Guido described, if you store an aware datetime and then your tz database is updated before you load it again, Model A and Model B aware datetimes preserve different invariants. A Model A aware datetime will preserve the timestamp it represents, even if that means it now represents a different local time than before the zoneinfo change. A Model B aware datetime will preserve the local clock time, even though it now corresponds to a different timestamp. * You can't compare or do arithmetic between datetimes in different timezones under Model B; you need to convert them to the same time zone first (which may require resolving an ambiguity). * Maintaining a `fold` attribute on datetimes at all is a departure from Model B, because it represents a bit of information that's simply nonsense/doesn't exist within Model B's naive-clock-time model. * Under Model B, conversions between timezones are lossy during a fold in the target timezone, because two different UTC times map to the same Model B local time. These models aren't chosen arbitrarily; they're the two models I'm aware of for what a "timezone-aware datetime" could possibly mean that preserve consistent arithmetic and total ordering in their allowed domains (in Model A, all aware datetimes in any timezone can interoperate as a single domain; in Model B, each timezone is a separate domain). A great deal of this thread (including most of my earlier messages and, I think, even parts your last message here that I'm replying to) has consisted of proponents of one of these two models arguing that behavior from the other model is wrong or inferior or buggy (or an "attractive nuisance"). I now think these assertions are all wrong :-) Both models are reasonable and useful, and in fact both are capable enough to handle all operations, it's just a question of which operations they make simple. Model B people say "just do all your arithmetic and comparisons in UTC"; Model A people say "if you want Model B, just use naive datetimes and track the implied timezone separately." I came into this discussion assuming that Model A was the only sensible way for a datetime library to behave. Now (thanks mostly to Guido's note about dates in 2020), I've been convinced that Model B is also reasonable, and preferable for some uses. I've also been convinced that Model B is the dominant influence and intended model in datetime's design, and that's very unlikely to change (even in a backwards-compatible way), so I'm no longer advocating that. Datetime.py, unfortunately, has always mixed behavior from the two models (interzone operations are all implemented from a Model A viewpoint; intrazone are Model B). Part of the problem with this is that it results in a system that looks like it ought to have total ordering and consistent arithmetic, but doesn't. The bigger problem is that it has allowed people to come to the library from either a Model A or Model B viewpoint and find enough behavior confirming their mental model to assume they were right, and assume any behavior that doesn't match their model is a bug. That's what happened to Stuart, and that's why pytz implements Model A, and has thus encouraged large swathes of Python developers to even more confidently presume that Model A is the intended model. I think your latest proposal for PEP 495 (always ignore `fold` in all intra-zone operations, and push the inconsistency into inter-zone comparisons - which were already inconsistent - instead) is by far the best option for bringing loss-less timezone-conversion round-trips to Model B. Instead of saying (as earlier revisions of PEP 495 did) "we claim we're really Model B, but we're going to introduce even more Model A behaviors, breaking the consistency of Model B in some cases - good luck keeping it straight!" it says "we're sticking with Model B, in which `fold` is meaningless when you're working within a timezone, but in the name of practical usability we'll still track `fold` internally after a conversion, so you don't have to do it yourself in case you want to convert to another timezone later." If the above analysis makes any sense at all to anyone, and you think something along these lines (but shorter and more carefully edited) would make a useful addition to the datetime docs (either as a tutorial-style "intro to how datetime works and how to think about aware datetimes" or as an FAQ), I'd be very happy to write that patch. Now on to your message: [Tim] > Classic arithmetic is equivalent to doing integer arithmetic on > integer POSIX timestamps (although with wider range the same across > all platforms, and extended to microsecond precision). That's hardly > novel - there's a deep and long history of doing exactly that in the > Unix(tm) world. Which is Guido's world. There "shouldn't be" > anything controversial about that. The direct predecessor was already > best practice in its world. How that could be considered a nuisance > seems a real strain to me. Unless I'm misunderstanding what you are saying (always likely!), I think this is just wrong. POSIX timestamps are a representation of an instant in time (a number of seconds since the epoch _in UTC_). If you are doing any kind of "integer arithmetic on POSIX timestamps", you are _always_ doing timeline arithmetic. Classic arithmetic may be many things, but the one thing it definitively is _not_ is "arithmetic on POSIX timestamps." This is easy to demonstrate: take one POSIX timestamp, convert it to some timezone with DST, add 86400 seconds to it (using "classic arithmetic") across a DST gap or fold, and then convert back to a POSIX timestamp, and note that you don't have a timestamp 86400 seconds away from the first timestamp. If you were doing simple "arithmetic on POSIX timestamps", such a result would not be possible. In Model A (the one that Lennart and myself and Stuart and Chris have all been advocating during all these threads), all datetimes (in any timezone) are unambiguous representations of a POSIX timestamp, and all arithmetic is "arithmetic on POSIX timestamps." That right there is the definition of timeline arithmetic. So yes, I agree with you that it's hard to consider "arithmetic on POSIX timestamps" an attractive nuisance :-) > Where it gets muddy is extending classic arithmetic to aware datetimes > too. If by "muddy" you mean "not in any way 'arithmetic on POSIX timestamps' anymore." :-) I don't even know what you mean by "extending to aware datetimes" here; the concept of "arithmetic on POSIX timestamps" has no meaning at all with naive datetimes (unless you're implicitly assuming some timezone), because naive datetimes don't correspond to any particular instant, whereas a POSIX timestamp does. > Then compounding the conceptual confusion by adding timeline > interzone subtraction and comparison. Yes, that addition (of Model A behavior into a Model B world) has caused plenty of confusion! It's the root cause for most of the content on this mailing list so far, I think :-) [Carl] >> If datetime did naive arithmetic on tz-annotated datetimes, and also >> refused to ever implicitly convert them to UTC for purposes of >> cross-timezone comparison or arithmetic, and included a `fold` parameter >> not on the datetime object itself but only as an additional input >> argument when you explicitly convert from some other timezone to UTC, >> that would be a consistent view of the meaning of a tz-annotated >> datetime, and I wouldn't have any problem with that. [Tim] > I would. Pure or not, it sounds unusable: when I convert _from_ UTC > to a local zone, I have no idea whether I'll end up in a gap, a fold, > or neither. And so I'll have no idea either what to pass _to_ > .utcoffset() when I need to convert back to UTC. It doesn't solve the > conversion problem. It's a do-it-yourself kit missing the most > important piece. "But .fromutc() could return the right flag to pass > back later" isn't attractive either. Then the user ends up needing to > maintain their own (datetime, convert_back_flag) pairs. In which > case, why not just store the flag _in_ the datetime? Only tzinfo > methods would ever need to look at it. Yes, I agree with you here. I think your latest proposal for PEP 495 does a great job of providing this additional convenience for the user without killing the intra-timezone Model B consistency. I just wish that the inconsistent inter-timezone operations weren't supported at all, but I know it's about twelve years too late to do anything about that other than document some variant of "you shouldn't compare or do arithmetic with datetimes in different timezones; if you do you'll get inconsistent results in some cases around DST transitions. Convert to the same timezone first instead." [Tim] >> But that isn't datetime's view, at least not consistently. The problem >> isn't datetime's choice of arithmetic; it's just that sometimes it wants >> to treat a tz-annotated datetime as one thing, and sometimes as another. > > How many times do we need to agree on this? ;-) Everybody all together now, one more time! :-) Until your latest proposal on PEP 495, I wasn't sure we really did agree on this, because it seemed you were still willing to break the consistency of Model B arithmetic in order to gain some of the benefits of Model A (that is, introduce _even more_ of this context-dependent ambiguity as to what a tz-annotated datetime means.) But your latest proposal fixes that in a way I'm quite happy with, given where we are. > Although the > conceptual fog has not really been an impediment to using the module > in my experience. > > In yours? Do you use datetime? If so, do you trip over this? No, because I use pytz, in which there is no conceptual fog, just strict Model A (and an unfortunate API). I didn't get to experience the joy of this conceptual fog until I started arguing with you on this mailing list! And now I finally feel like I'm seeing through that fog a bit. I hope I'm right :-) Carl
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Datetime-SIG mailing list [email protected] https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
