Re: ORM refactoring

Anssi Kääriäinen Mon, 10 Oct 2011 07:59:40 -0700


On Oct 10, 4:13 pm, Luke Plant <l.plant...@cantab.net> wrote:


> One of the problems is that it can be very hard to review refactorings.
> For example, I recently checked in rev 16929 [1] from ivan_virabyan's
> patch, and reviewing it was hard, despite the fact that it was a very
> good quality patch. The difficulty is that there appear to be a large
> number of changes, when in reality the code has just moved around. But
> confirming that it has indeed stayed the same is very time consuming if
> you do it properly - in fact it seems almost faster to do the same
> changes yourself, and verify you get the same result.

Well, the patch doesn't move a lot of code - it does actually change
the code. Not that it makes it any easier to review the patch :)

> This is only thinking out loud, but I'm wondering if that approach might
> not be too bad. If instead of a large patch, we have a series of smaller
> steps (in a github branch or whatever), perhaps that would be easier to
> review? I'm not saying rework any already created patches to match this,
> BTW.

I too support the idea of smaller patches. But the problem with the
ORM is that the bigger changes tend to cascade. Or at least that was
what happened to the where node refactoring. It might be that the
patch is too big to swallow. But the benefits are real: django
benchmark suite shows 4-5x speed improvement for chained queryset
construction. And 1.5x improvements in many cases. Those are
differences important to many users. Details in the ticket.

> We do have pretty good test coverage, which should be a good help, but
> one of the problems here is the mutability thing - it is very difficult
> to write a test that shows that shows that a new QuerySet is not sharing
> data structures with the old in a way that is incorrect and could lead
> to the previous QuerySet being modified by the new one (or subsequent ones).

The patched version actually does share structures - the leaf nodes of
the where tree. They are only cloned when relabel_aliases is called,
and that together with avoiding deepcopy is where the speed difference
comes. Ok, the where trees are also kept as small as possible, so that
too affects performance.

> It was me who suggested the idea of basing the ORM on SQLAlchemy in
> future, on my blog [2]. It would be a very large change, especially
> things related to the unit-of-work stuff, and I have no idea whether it
> is even feasible really.
>
> My concern is that the ORM currently leaves you high and dry when you
> come to the end of the queries it can generate. Some people say the
> answer is raw SQL at that point, but if you have built up your query
> dynamically in any way, or perhaps in a completely Model agnostic way
> (as I was doing in django-easyfilters), you are completely stuck,
> because you are left with a dynamically created query that you really
> cannot usefully manipulate in any way, but which you must manipulate in
> order to do the queries you need. I think the best you could do is
> generate the SQL from the query as a string (which is actually hard to
> do currently due to the way that SQL generation interacts with the
> different backends), parse it again with python-sqlparse, manipulate and
> pass to QuerySet.raw(). And then somehow cope with the backend specific
> stuff. Ugh.
>
> SQLA, I believe, would be very different, because 'core' provides a
> complete wrapper around SQL, and the ORM allows you to drop down to core
> to do advanced things. And it has a much nicer way of dealing with the
> different SQL dialects.

In the perfect world you could have a SQLA backend, or qs.as_sqla()
method. That would be neat, yes. QuerySet chaining could be hard to
implement, though (or does SQLA have support for something like
that?). But in my opinion this is another argument for refactoring the
ORM. When the ORM has better separation of concerns and better
abstractions between compiler and sql/query.py it will be easier to
write different compilers - be it a compiler for Cassandra or SQLA.

> I do think that these refactorings you are doing are valuable. However,
> if you work on this, I think you need to be prepared for the possibility
> that the work may be a dead end. It could be an experiment that shows
> that a different approach will be needed. If it does that, it will still
> have been useful work, but you will need a certain kind of attitude to
> still consider it useful work if we end up switching to SQLA :-)

I am fine with the code being discarded, no problems with that. I just
want it to be at least somewhat useful... My biggest worry is not
getting any response at all. That means I do work which is not useful
at all.

And trust me, I have already met some dead ends in my refactorings.
You may not have seen them, but they do exist... :)

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: ORM refactoring

Reply via email to