I could not accept a PEP that leads to different datetime being considered == but having a different hash (*unless* due to a buggy tzinfo subclass implementation -- however no historical timezone data should ever depend on such a bug).
I'm much less concerned about < being intransitive in edge cases. I also don't particularly care about == following from the difference being zero. Still, unless we're constrained by backward compatibility, I would rather not add equivalence between *any* two datetimes whose tzinfo is not the same object -- even if we can infer that they both must refer to the same instant. On Tue, Sep 1, 2015 at 8:59 AM, Alexander Belopolsky < [email protected]> wrote: > > > On Mon, Aug 31, 2015 at 9:01 PM, Tim Peters <[email protected]> wrote: > >> [Alex] >> > It's getting late in my TZ, but what you are saying below sounds like a >> > complaint that if you put t=second 01:30 as a key in the dictionary, you >> > cannot later retrieve it by looking up t.astimezone(timezone.utc). >> >> I don't grasp that. What I am saying should be very clear: under all >> schemes so far, there are datetimes x and y such that x == y but >> hash(x) != hash(y). You see that or you don't. If you don't, I'll >> keep trying until you do ;-) So do you see that? >> >> > I do (in the morning.) > >> .. >> > [Alex] > >> >> > Maybe if we decide to do something with the arithmetic, we will be able >> to >> > fix this wart as well. >> > > [Tim] > >> Doubt it - this has nothing to do with arithmetic I can see. It's a >> consequence of wanting to ignore `fold` in contexts where it really >> does make a difference. __hash__() is one such place. >> > > Arithmetic and comparisons are intertwined as long as you require that > not(a - b) ⇔ a==b. The main problem for hash as I see it is that x == y > may or may not call x.utcoffset() depending on the value of y. This is a > problem for hash(x) which should decide whether > >> >> Like I said at the start, it's a puzzle. >> > > Let's formulate the puzzle: Define datetime.__hash__ so that given PEP 495 > rules for datetime.__eq__, x == y implies hash(x) == hash(y). > > First (trivial) observation: a solution exists, e.g. hash(x) == 0. This > is not a very practical solution, but shows that the puzzle is not a > logical impossibility. > > Second observation: We cannot improve on hash(x) == 0 without some > knowledge of what timezones are known to the system. Proof: let u1 < u2 be > two arbitrary UTC times. We can always construct a timezone (FoldZone) > where u1 and u2 map to the same local time. All we need to do is to create > a fold of size u2 - u1 at some time u between u1 and u2. Let t1 = > u1.astimezone(FoldZone) > and t2 = u2.astimezone(FoldZone). By construction, t1 == t2, t1.fold = 0 > and t2.fold = 1. If x == y implies hash(x) == hash(y), then u1 == t1 > implies hash(u1) == hash(t1) and similarly u2 == t2 implies hash(u2) == > hash(t2) and t1 == t2 implies hash(t1) == hash(t2). Since hash values are > integers and == is transitive for integers, from a chain hash(u1) == > hash(t1), hash(t1) == hash(t2), hash(t2) == hash(u2 )we conclude that > hash(u1) == hash(u2) and therefore the only solution is hash(x) == const. > > This sounds discouraging, but note that the FoldZone that we constructed > is rather unrealistic because depending on the values of u1 and u2, the > size of the fold can range from microseconds to centuries. > > Third observation: If we have only one variable offset timezone (Local), > then we can solve the problem by defining datetime.__hash__(self) as for > example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1, > tzinfo=Local)). Note that in the last expression, hash is taken of a > timedelta object, so the definition is not circular. (A proof that x == y > implies hash(x) == hash(y) in this case is left as an exercise for the > reader.:-) > > Fourth observation: A solution for one variable offset timezone > generalizes to the case of an arbitrary number of such timezones. A > theoretical construction is to simply iterate x = > x.astimezone(Zone).replace(fold=0) over all the zones known to the system, > but certainly a more efficient algorithm can be devised to to achieve the > same result in a single lookup in a specially crafted table. > > So the puzzle is not unsolvable, but how much of it has to be solved in > PEP 495? I would say not much. I agree with Tim that non-transitivity of > == and the violation of the hash invariant need to be mentioned in the > PEP. However, since PEP 495 by itself does not introduce any new tzinfo > implementations and the existing fixed offset timezones don't suffer from > this problem, I think we can leave the final resolution to the > timezone.local or the zoneinfo PEP. > > An important lesson is in the second observation. To solve the hash > puzzle, we need to have a global view of the totality of timezones that > will be supported by the system. > > _______________________________________________ > Datetime-SIG mailing list > [email protected] > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Datetime-SIG mailing list [email protected] https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
