#36748: Postgres UNNEST optimisation of bulk_create  cannot handle fields with
`get_placeholder`.
-------------------------------------+-------------------------------------
     Reporter:  Chris Wesseling      |                    Owner:  (none)
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  5.2
  (models, ORM)                      |
     Severity:  Release blocker      |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

 * severity:  Normal => Release blocker
 * stage:  Unreviewed => Accepted


Old description:

> In order to allow fields to define a different placeholder than `"%s"` in
> https://github.com/django/django/blob/97acd4d2f92eef8c285bac070d437bf0fd52e071/django/db/models/sql/compiler.py#L1769
> `BaseSQLInsertCompiler.assemble_as_sql` does:
>
> {{{
>         get_placeholders = [getattr(field, "get_placeholder", None) for
> field in fields]
> }}}
>
> and calls those functions with `(value, compiler, connection)` to
> generate the placeholders.
>
> These alternative placeholders are not always UNNESTable. e.g. `"%s::{}"`
> for arrays and and ranges, which we tried to work-around by filtering
>
> {{{
>             # Fields that don't use standard internal types might not be
>             # unnest'able (e.g. array and geometry types are known to be
>             # problematic).
>             or any(
>                 (field.target_field if field.is_relation else
> field).get_internal_type()
>                 not in self.connection.data_types
>                 for field in fields
>             )
> }}}
>
> but more general since `get_placeholder` can be a function of `value`, we
> can't UNNEST forall `fields, value_rows`, as it breaks the "one
> placeholder fits all values" assumption.
>
> Therefore if any field in fields has get_placeholder, we should fallback
> to the default, unoptimised implementation.
>
> In fact the addition of
> {{{
>             # Field.get_placeholder takes value as an argument, therefore
> the
>             # resulting placeholder might be dependent on the value.
>             # in UNNEST requires a single placeholder to "fit all values"
> in
>             # the array.
>             or any(hasattr(field, "get_placeholder") for field in fields)
> }}}
>
> before the mentioned internal_type filter that's already present, handles
> all cases in the test suite and never hits the internal_type case,
> because all test examples also have a get_placeholder. I don't know if
> this will always be the case, so I suggest keeping the present filter
> too.
>
> I bumped into this regression when upgrading to 5.2 and the call to a
> Postgres extension function in such a placeholder, but which has a
> db_type that is present in self.connection.data_types, didn't happen
> anymore for bulk inserts.
>

> **NB This bug can go unnoticed: bulk_create might succeed, but with
> different data written to disk than what is expected. And subsequent
> reads and seeks will have the wrong results.**

New description:

 In order to allow fields to define a different placeholder than `"%s"` in
 
https://github.com/django/django/blob/97acd4d2f92eef8c285bac070d437bf0fd52e071/django/db/models/sql/compiler.py#L1769
 `BaseSQLInsertCompiler.assemble_as_sql` does:

 {{{
         get_placeholders = [getattr(field, "get_placeholder", None) for
 field in fields]
 }}}

 and calls those functions with `(value, compiler, connection)` to generate
 the placeholders.

 These alternative placeholders are not always UNNESTable. e.g. `"%s::{}"`
 for arrays and and ranges, which we tried to work-around by filtering

 {{{
             # Fields that don't use standard internal types might not be
             # unnest'able (e.g. array and geometry types are known to be
             # problematic).
             or any(
                 (field.target_field if field.is_relation else
 field).get_internal_type()
                 not in self.connection.data_types
                 for field in fields
             )
 }}}

 but more general since `get_placeholder` can be a function of `value`, we
 can't UNNEST forall `fields, value_rows`, as it breaks the "one
 placeholder fits all values" assumption.

 Therefore if any field in fields has get_placeholder, we should fallback
 to the default, unoptimised implementation.

 In fact the addition of
 {{{
             # Field.get_placeholder takes value as an argument, therefore
 the
             # resulting placeholder might be dependent on the value.
             # in UNNEST requires a single placeholder to "fit all values"
 in
             # the array.
             or any(hasattr(field, "get_placeholder") for field in fields)
 }}}

 before the mentioned internal_type filter that's already present, handles
 all cases in the test suite and never hits the internal_type case, because
 all test examples also have a get_placeholder. I don't know if this will
 always be the case, so I suggest keeping the present filter too.

 I bumped into this regression when upgrading to 5.2 and the call to a
 Postgres extension function in such a placeholder, but which has a db_type
 that is present in self.connection.data_types, didn't happen anymore for
 bulk inserts.

--
Comment:

 Thanks for the excellent report and patch Chris!

 The only part missing is a release note for 5.2.9 as this is a bug in a
 newly introduced feature and thus still qualifies for backport.

 > I don't know if this will always be the case, so I suggest keeping the
 present filter too.

 I also think it's worth keeping both as well as they are cheap to perform
 and could cover other cases in the wild.

 Small note to me that this relates to #36727 but the latter won't fix
 issue as, just like you said, `get_placeholder` can be a function of
 `value`. This only means we'll need even more logic to handle the presence
 of either `get_placeholder` or `get_placeholder_sql`.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36748#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019aa1e902ea-e7138d4a-a672-4dd4-9eb7-e294884551a9-000000%40eu-central-1.amazonses.com.

Reply via email to