No credit for anything other than the "extra credit" section. Partial credit for that. Study that printout and you should understand what Tim was saying.
> On Sep 14, 2015, at 9:19 PM, Random832 <[email protected]> wrote: > >> On Mon, Sep 14, 2015, at 18:09, Tim Peters wrote: >> Sorry, I'm not arguing about this any more. Pickle doesn't work at >> all at the level of "count of bytes followed by a string". > > The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes > indeed* "count of bytes followed by a string". > >> If you >> want to make a pickle argument that makes sense, I'm afraid you'll >> need to become familiar with how pickle works first. This is not the >> place for a pickle tutorial. >> >> Start by learning what a datetime pickle actually is. >> pickletools.dis() will be very helpful. > > 0: \x80 PROTO 3 > 2: c GLOBAL 'datetime datetime' > 21: q BINPUT 0 > 23: C SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00' > 35: q BINPUT 1 > 37: \x85 TUPLE1 > 38: q BINPUT 2 > 40: R REDUCE > 41: q BINPUT 3 > 43: . STOP > > The payload is ten bytes, and the byte immediately before it is in fact > 0x0a. If I pickle any byte string under 256 bytes long by itself, the > byte immediately before the data is the length. This is how I initially > came to the conclusion that "count of bytes followed by a string" was > valid. > > I did, before writing my earlier post, look into the high-level aspects > of how datetime pickle works - it uses __reduce__ to create up to two > arguments, one of which is a 10-byte string, and the other is the > tzinfo. Those arguments are passed into the date constructor and > detected by that constructor - for example, I can call it directly with > datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result > as unpickling. > > At the low level, the part that represents that first argument does > indeed appear to be "count of bytes followed by a string". I can add to > the count, add more bytes, and it will call the constructor with the > longer string. If I use pickletools.dis on my modified value the output > looks the same except for, as expected, the offsets and the value of the > argument to the SHORT_BINBYTES opcode. > > So, it appears that, as I was saying, "wasted space" would not have been > an obstacle to having the "payload" accepted by the constructor (and > produced by __reduce__ ultimately _getstate) consist of "a byte string > of >= 10 bytes, the first 10 of which are used and the rest of which are > ignored by python <= 3.5" instead of "a byte string of exactly 10 > bytes", since it would have accepted and produced exactly the same > pickle values, but been prepared to accept larger arguments pickled from > future versions. > > For completeness: Protocol version 2 and 1 use BINUNICODE on a > latin1-to-utf8 version of the byte string, with a similar "count of > bytes followed by a string" (though the count of bytes is of UTF-8 > bytes). Protocol version 0 uses UNICODE, terminated by \n, and a literal > \n is represented by \\u000a. In all cases some extra data around the > value sets it up to call "codecs.encode(..., 'latin1')" upon unpickling. > > So have I shown you that I know enough about the pickle format to know > that permitting a longer string (and ignoring the extra bytes) would > have had zero impact on the pickle representation of values that did not > contain a longer string? I'd already figured out half of this before > writing my earlier post; I just assumed *you* knew enough that I > wouldn't have to show my work. > > Extra credit: > 0: \x80 PROTO 3 > 2: c GLOBAL 'datetime datetime' > 21: q BINPUT 0 > 23: ( MARK > 24: M BININT2 2014 > 27: K BININT1 9 > 29: K BININT1 14 > 31: K BININT1 21 > 33: K BININT1 6 > 35: K BININT1 42 > 37: t TUPLE (MARK at 23) > 38: q BINPUT 1 > 40: R REDUCE > 41: q BINPUT 2 > 43: . STOP > _______________________________________________ > Datetime-SIG mailing list > [email protected] > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ _______________________________________________ Datetime-SIG mailing list [email protected] https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
