#33465: Introduce empty __slots__ protocol for SafeString & SafeData
-------------------------------------+-------------------------------------
     Reporter:  Keryn Knight         |                    Owner:  Keryn
         Type:                       |  Knight
  Cleanup/optimization               |                   Status:  assigned
    Component:  Utilities            |                  Version:  dev
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Description changed by Keryn Knight:

Old description:

> This is a case-by-case proposal ultimately referencing #12826
>
> Because `SafeString` is used ''a lot'' and is otherwise supposed to be
> treatable as a untainted `str` we should be able to (AFAIK) update it +
> it's inheritance chain to use `__slots__ = ()` whilst still allowing
> custom subclasses of either to add additional attributes. By defining
> `__slots__` as empty on `SafeString` (**and** `SafeData`) we'd avoid
> creation of a `__dict__` on the instance, which mirrors the `str()`
> behaviour.
>
> According to pympler, currently in Python `3.10` using the following back
> of the napkins strings:
> {{{
> In [4]: s = "test"  # this might be interned, as a short string?
>
> In [5]: s2 = "test" * 100
>
> In [6]: s3 = SafeString("test")
>
> In [7]: s4 = SafeString("test" * 100)
> }}}
> we get:
> {{{
> In [8]: asizeof(s)  # str
> Out[8]: 56
>
> In [9]: asizeof(s2)  # str
> Out[9]: 456
>
> In [10]: asizeof(s3)  # SafeString
> Out[10]: 208
>
> In [11]: asizeof(s4)  # SafeString
> Out[11]: 608
> }}}
> But if we swap out the implementation to be slots'd, it looks more like:
> {{{
> In [8]: asizeof(s)  # str
> Out[8]: 56
>
> In [9]: asizeof(s2)  # str
> Out[9]: 456
>
> In [10]: asizeof(s3)  # SafeString
> Out[10]: 104
>
> In [11]: asizeof(s4)  # SafeString
> Out[11]: 504
> }}}
>
> So we're "saving" `104 bytes` per `SafeString` created, by the look of
> it. I presume it to be some fun implementation detail of something
> somewhere that it is allegedly accounting for more than `56` bytes, which
> is the `asizeof({})`
>
> A quick and dirty check over the test suite suggests that for me locally,
> running `14951 tests in 512.912s` accounted for `949.0 MB` of
> SafeStrings, checked by just incrementing a global integer of bytes
> (using `SafeString.__new__` and `--parallel=1`) and piping that to
> `filesizeformat`, so y'know, ''room for error''.
> After the patch, the same tests accounted for `779.4 MB` of `SafeString`,
> "saving" `170 MB` overall.
>
> The only functionality this would preclude -- as far as I know -- is no
> longer being able to bind arbitrary values to an instance like so:
> {{{
> s = SafeString('test')
> s.test = 1
> }}}
> which would raise `AttributeError` if `__slots__` were added, just like
> trying to assign attributes to `str()` directly does.
>
> I don't believe this will have any marked performance change, as neither
> `SafeString` nor `SafeData` actually have any extra attributes, only
> methods.
>
> I have a branch which implements this, and tests pass for me locally.

New description:

 This is a case-by-case proposal ultimately referencing #12826

 Because `SafeString` is used ''a lot'' and is otherwise supposed to be
 treatable as a untainted `str` we should be able to (AFAIK) update it +
 it's inheritance chain to use `__slots__ = ()` whilst still allowing
 custom subclasses of either to add additional attributes. By defining
 `__slots__` as empty on `SafeString` (**and** `SafeData`) we'd avoid
 creation of a `__dict__` on the instance, which mirrors the `str()`
 behaviour.

 According to pympler, currently in Python `3.10` using the following back
 of the napkins strings:
 {{{
 In [4]: s = "test"  # this might be interned, as a short string?

 In [5]: s2 = "test" * 100

 In [6]: s3 = SafeString("test")

 In [7]: s4 = SafeString("test" * 100)
 }}}
 we get:
 {{{
 In [8]: asizeof(s)  # str
 Out[8]: 56

 In [9]: asizeof(s2)  # str
 Out[9]: 456

 In [10]: asizeof(s3)  # SafeString
 Out[10]: 208

 In [11]: asizeof(s4)  # SafeString
 Out[11]: 608
 }}}
 But if we swap out the implementation to be slots'd, it looks more like:
 {{{
 In [8]: asizeof(s)  # str
 Out[8]: 56

 In [9]: asizeof(s2)  # str
 Out[9]: 456

 In [10]: asizeof(s3)  # SafeString
 Out[10]: 104

 In [11]: asizeof(s4)  # SafeString
 Out[11]: 504
 }}}

 So we're "saving" `104 bytes` per `SafeString` created, by the look of it.
 I presume it to be some fun implementation detail of something somewhere
 that it is allegedly accounting for more than `64` bytes, which is the
 `asizeof({})`

 A quick and dirty check over the test suite suggests that for me locally,
 running `14951 tests in 512.912s` accounted for `949.0 MB` of SafeStrings,
 checked by just incrementing a global integer of bytes (using
 `SafeString.__new__` and `--parallel=1`) and piping that to
 `filesizeformat`, so y'know, ''room for error''.
 After the patch, the same tests accounted for `779.4 MB` of `SafeString`,
 "saving" `170 MB` overall.

 The only functionality this would preclude -- as far as I know -- is no
 longer being able to bind arbitrary values to an instance like so:
 {{{
 s = SafeString('test')
 s.test = 1
 }}}
 which would raise `AttributeError` if `__slots__` were added, just like
 trying to assign attributes to `str()` directly does.

 I don't believe this will have any marked performance change, as neither
 `SafeString` nor `SafeData` actually have any extra attributes, only
 methods.

 I have a branch which implements this, and tests pass for me locally.

--

-- 
Ticket URL: <https://code.djangoproject.com/ticket/33465#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/067.c7b1a3ca916354f24a9a104ec994559a%40djangoproject.com.

Reply via email to