So,...

On Monday 17 June 2013 16:02:05 Shai Berger wrote:
> On Thursday 16 May 2013 03:51:05 Shai Berger wrote:
> > Ticket #20414[0] and PR #1071[1].
> 
> Over a month, and no comment from anyone on these; I take it as sign that
> nobody has time for it. Unless someone wakes up quite quickly to oppose,
> I'm going to commit essentially the PR [...]

Nobody opposed. Tim Graham reviewed the PR and made some helpful comments -- 
thanks, Tim. But before committing, I wanted to run again some benchmarks, to 
see how the recent changes in query processing affect these performance 
enhancements.

I found what Anssi had found, way back when[2]: Switching to outputtypehandler 
makes almost no difference for ints, makes float columns (FloatFields) about 
2.5 
times faster, and makes decimal columns (DecimalFields and raw expressions) 
about 2 times slower. I found that baffling, and investigated.

Some background: Oracle number columns support 38 significant digits, which is 
significantly more than supported by Python floats (=C doubles). cx_Oracle 
supports decimal.Decimal as an input type, but not as an output type (as far 
as I could see) -- so to get accurate results when selecting such columns, one 
must read these numbers as strings and convert them to numbers later. There 
are two ways to do this -- either use an outputtypehandler, which lets us 
define how we want to process each column; or use numbersAsStrings, which tells 
cx_Oracle to return _all_ numbers as strings, to be processed later in Python.

The part where ints do not improve is explained quite easily: cx_Oracle 
returns ints as (small, C) ints only if it knows they will fit -- which means, 
if they have less than 10 digits. IntegerFields (incl. AutoField, 
OneToOneField, and, surprisingly, even SmallIntegerField)  are all NUMBER(11), 
and INTEGER is actually NUMBER(38); and so they get converted to a python long 
int. Python makes this transparent, so nobody notices, but the conversion 
through string happens whether numbersAsString is used or not. If we had 
chosen NUMBER(9), some C ints would not fit in, but everything would be a lot 
faster. But that is mostly water under the bridge now[3].

The part where decimals become slower -- I'm still baffled by that. Anssi 
claimed it was due to calling a function for each value separately (with 
numbersAsStrings, a function is only called per row). But this makes little 
sense -- we see the slow-down in a benchmark with one-column rows.

I tried to get the best of both world by only using numbersAsStrings when 
NUMBER() columns are present -- but that doesn't work; the setting needs to be 
made before the execute() call in order to take effect, and at that time we 
don't know what columns we are expecting.

Using an outputtypehandler opens the door to some user-initiated 
optimizations; one thing I noticed is that for large queries (returning 
thousands of rows), performance improves when using a larger cursor arraysize 
(the default is 100). This is not the typical query for a web application, but 
the Django ORM is used in batch processing as well.

As I said, I prepared a new PR, 1279[4]; the options, as I see them, are:

a) Make this change, sacrificing decimals for floats
b) Reject the change, sacrificing floats for decimals
c) Somehow let users choose (either globally or for specific queries) -- but 
then we still need to choose the default

Your advice is appreciated,

Shai.
 
[0] https://code.djangoproject.com/ticket/20414
[1] https://github.com/django/django/pull/1071
[2] https://groups.google.com/d/topic/django-developers/4BNkJyGez9A/discussion
[3] This optimization opportunity is no longer even available with Python 3, 
at least with cx_Oracle 5.1.2
[4] https://github.com/django/django/pull/1279

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to