Re: [Django] #26683: Oracle DBs: Add option to use VARCHAR2 data type

Django Thu, 02 Jun 2016 05:13:16 -0700

#26683: Oracle DBs: Add option to use VARCHAR2 data type
--------------------------------+--------------------------------------
     Reporter:  elarandjelovic  |                    Owner:  nobody
         Type:  New feature     |                   Status:  new
    Component:  Migrations      |                  Version:  1.9
     Severity:  Normal          |               Resolution:
     Keywords:  Oracle          |             Triage Stage:  Unreviewed
    Has patch:  0               |      Needs documentation:  0
  Needs tests:  0               |  Patch needs improvement:  0
Easy pickings:  0               |                    UI/UX:  0
--------------------------------+--------------------------------------

Comment (by jarshwah):

Just about every database I've worked with uses VARCHAR rather than
NVARCHAR. I wasn't aware of performance issues between the two though.
Vackar, were the joins you're talking about joins between NVAR and VAR
fields, or just two tables that contained those fields. I'm fairly sure
you mean joining NCHAR to CHAR, but just want to be clear.

I guess the original decision to support NVARCHAR was due to the
possibility/frequency of oracle databases at the time not using utf8 (or
similar) encoding by default, so the NVARCHAR was a way to make unicode
work in all circumstances.

Recommendations mostly follow this from Ask Tom

(https://asktom.oracle.com/pls/apex/f%3Fp%3D100:11:::NO:RP:P11_QUESTION_ID:9462837200346048883)

> NVARCHAR2 is a unicode-only data type. It's useful if you want to have
unicode for some columns (those that are NVARCHAR2) but have the rest of
the database use a different characterset. In all other cases stick with
using VARCHAR2.

I think this is probably worth fixing provided a palatable solution can be
found. Simply querying for the character set is probably not enough. You
need to know this information every time you run a migration (or cache it
somewhere). If the character set is changed between migrations, now you
have different character types within the same application which is
arguably worse. I'm unsure if django needs to know what the type of
varchar column is at runtime or not, but if it does, that'd incur an extra
query at connection creation time.

Also, looking into the code a bit it looks like the
startswith/istartswith/contains based operators all expect NCHAR, and do
the appropriate conversions:

{{{
_standard_operators = {
'contains': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'icontains': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'startswith': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'endswith': "LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'istartswith': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
'iendswith': "LIKE UPPER(TRANSLATE(%s USING NCHAR_CS)) ESCAPE
TRANSLATE('\\' USING NCHAR_CS)",
}
}}}

I would think most of that TRANSLATE/USING would be unnecessary if we were
simply using varchar to begin with. There's probably other optimisations
we can make elsewhere if varchar was used. But these operators probably
prevent you from creating a useful model field yourself - I haven't tried
to see what happens if you do LIKE comparisons by translating char using
nchar.

Migrations between the two are possible, but it requires creating a new
field, migrating the data across, and then dropping the existing field.
For big tables that's going to be very expensive.

It seems to me that this is an all or nothing proposition. Since the
character encoding is database wide (not per user/schema), and there are
many very old installations of oracle around, we can't guarantee that an
existing django deployment will be able to continue on a version using
varchar if they aren't able to change the global character set.

I think this is going to require a new backend since there is no easy way
to dynamically switch behaviours. A new backend would be able to sweep out
a tonne of old work arounds that are probably not useful anymore. It'd
also be able to break backwards compatibility, because I don't see the
django organisation maintaining two oracle backends when maintaining the
current one is hard enough due to lack of oracle maintainers.

Sorry for the wall of text, but I kept jumping down new rabbit holes. As I
said earlier though, if a palatable solution can be found, I think it's
definitely worth looking into.

--
Ticket URL: <https://code.djangoproject.com/ticket/26683#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/072.898515f46925ac1e41a6c035c6b7553d%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Django] #26683: Oracle DBs: Add option to use VARCHAR2 data type

Reply via email to