Re: Proposal: use SQLAlchemy Core for query generation

Asif Saifuddin Sat, 05 Dec 2015 23:37:54 -0800

Hi Luke,

I have been thinking and researching a lot recently about SQLalchemy and 
django integration.


Now SQLalchemy provides a devlarative orm extension too quite identical to 
django orm[not same], django has also refactored a lot in db internals with 
model meta refactore, expressions, new migrations and many stuff.

There would be 2 new approach for your proposal--

1) Probably the people used to/willing to use sqlalchemy's declarative orm 
extension will use that and there will be needed to create new model meta 
API driven abstraction layer which will help to show sqlalchemy generated 
models in django admin and django form/model form.
  * migrations would be handled by alaembic/sqlalchemy-migrate/whatever 
best available

2) Django Model ORM will be  rewritten in a 3rd party project as a parallel 
counterpart/equivalent to SQLalchemy's declarative ORM extension which will 
work like django model//migrations/command based normal djangoish workflow 
but under the hood will be used by sqlalchemy core/3rd party tools.

I'm willing to address the both issue in a solid 3rd party package. In both 
case propably django internal too needed to be ironed for edge cases.

I'm trying to tracking this thoughts in a git repo and also looked for some 
older attempts to understand the implementation way.

Looking forward to hear your thoughts.


Regards,

Asif



On Saturday, June 30, 2012 at 8:22:21 PM UTC+6, Luke Plant wrote:
>
> Hi all, 
>
> A good while back I put forward the idea of using SQLAlchemy Core in 
> Django [1]. Having had more experience working with SQLAlchemy, I'm 
> putting that idea forward as a formal proposal, as I mentioned in a more 
> recent thread here. 
>
> Apologies in advance for the length! I've included a few 'TL;DR' 
> summaries and headings in the different sections which you might want to 
> scan first. 
>
> === Proposal === 
>
> We should re-write our query generation code to use SQLAlchemy Core. 
> This includes DDL queries as well as DML (e.g. CREATE TABLE as well as 
> SELECT, INSERT etc). 
>
> This would also involve replacing our database connection objects with 
> SQLAlchemy's. In this proposal, our high-level ORM, with model 
> definition and query API, would remain the same - we wouldn't be using 
> the SQLAlchemy ORM. 
>
> This is a "Django 2.0" proposal i.e. not immediate future, and not fully 
> backwards compatible. 
>
> === Background - Django 2.0 philosophy === 
>
> TL;DR: "evolution not revolution, some backwards incompatibilities" 
>
> Although we haven't really discussed the timing of Django 2.0, or its 
> philosophy, I think we should be doing. My own assumption is that it 
> should be evolution, not revolution, similar to Python 3, where we break 
> stuff that has dogged us in the past, and will hamper us going forward, 
> but don't make radical changes where not necessary. This proposal fits 
> that basic assumption. 
>
> Also, in Django to date we've eschewed external dependencies. That has 
> been partly due to the poor and confusing state of Python packaging, 
> which is hopefully improving. I think it will make us a very poor Python 
> citizen if we don't reverse this policy at some point, and Django 2.0 is 
> an obvious point to do it. 
>
> This proposal does not represent a move away from being a 'full stack' 
> framework. "Full stack" does not mean "no dependencies". 
>
> Our current recommended installation method is via pip [2], and we would 
> be doing our users a disservice to recommend otherwise. Installation via 
> pip actually makes the instructions shorter - manual methods require 
> things like removing old versions manually - and for anything beyond 
> trivial use of Django you have to know pip anyway. 
>
> So, with our recommended installation method, adding a dependency 
> doesn't make things any more difficult at all. 
>
> === Background - ORM philosophy === 
>
> TL;DR: "Let's make Django's DB layer the best it can be for relational 
> databases." 
>
> Whether we call it "the ORM" or "the model layer" as some people prefer, 
> I think it's fairly certain that the overwhelming majority of our users 
> are using relational databases. 
>
> Many of the things that make Django a compelling choice, 
> including the admin and re-usable apps, either don't work or are of 
> little use if you are not using a relational database. 
>
> So my philosophy is that we should aim to provide a really excellent 
> ORM that will take users as far as possible. 
>
> This doesn't preclude having non-relational support in Django. But 
> it seems very strange to make that the focus, when we've had 
> little-to-no support for it so far, or to allow that support to limit 
> how well we can cater for the 99%. 
>
> === Motivation === 
>
> == Motivation 1: Django's ORM leaves you high and dry when you reach its 
> limits. 
>
> While the ORM can do a surprising number of queries, there are plenty it 
> can't, and in all the medium-to-large projects I've worked on I've gone 
> beyond what the ORM can do. 
>
> At this point, you've got a few options, from easiest to hardest: 
>
> A) Do the aggregation/filtering etc in Python. 
>
> B) Write raw SQL. 
>
> C) Use SQLAlchemy or some other SQL generation tool. 
>
> D) Write a patch to extend the ORM. 
>
>
> None of these is great: 
>
> A) Data manipulation in Python, when it could be done in SQL, is 
> obviously a bad idea since it is usually very inefficient. But I've seen 
> a lot of code that does this, because it was hard/impossible to get 
> Django's ORM to do the query needed. 
>
> This anti-pattern will also give Django applications the reputation for 
> being slow. Obviously, we can point the finger at the developer, but if 
> we've made it hard for the developer to do the right thing, that is 
> unfair. 
>
> B) Raw SQL fails if you have dynamic queries i.e. where the shape of the 
> query can vary. 
>
> Example 1: you are writing library code e.g. a re-usable app that knows 
> nothing about the tables it is actually querying, and may have been 
> given any arbitrary QuerySet as an input to manipulate in some way. 
>
> Example 2: even if you have full knowledge of the tables, you might have 
> additional WHERE clauses/JOINs/sub queries in some cases, that you want 
> to programmatically add to the query. 
>
> I've come across both these types in projects I've been involved in, and 
> I know I'm far from the only one. 
>
> Raw SQL can also fail if you are manually writing 'static' queries but 
> need compatibility with multiple DB backends. 
>
> C) For SQL generation, SQLAlchemy is the best, but for good reason it 
> comes with its own database connection objects. Having two sources of 
> connection objects causes problems, such as queries not seeing what was 
> done in the transaction established by the other connection. 
>
> This can be especially painful for tests even if you're not making much 
> use of transactions in your live system, since it leads to needing the 
> much slower TransactionTestCase and hacks to make it work. 
>
> D) This is basically too hard for most of us, including core developers. 
> One reason is that the the code involved is not in the best shape. (More 
> below. This might change if Anssi's work succeeds, but I still have big 
> doubts about how far we can push our ORM). 
>
>
> So, for these reasons, when you reach the end of what the ORM can 
> provide, you are stuck with a lot of poor or painful options. It sucks 
> to try to explain to a customer that although the feature they want is 
> relatively straightforward, and easily supportable by the major 
> component (the database), the limitations of the tool you are using make 
> it impractical. It *really* sucks when the tool in question is Django. 
>
> If SQLAlchemy was used for our query generation, however, it would be 
> easy to get a query object for 'the query so far' and start from there, 
> using the full power of SQLAlchemy Expression Language to build whatever 
> query you need. 
>
> == Motivation 2: More db backends 
>
> SQLAlchemy has a load of code to cope with different DB dialects, and we 
> have some too. This is largely overlapping, which is duplicated work. 
> It's not entirely overlapping though, and there are some people who 
> can't use one of Django/SQLAlchemy due to choice of DB. If we combined 
> work, both projects would benefit - SQLAlchemy would get a large number 
> of people now interested in fixing whatever backend support is missing 
> in their code, and we'd get all their existing backends for free. 
>
> Mike Bayer is enthusiastic about this possibility [3]. 
>
> == Motivation 3: Code cleanup 
>
> As mentioned, our own query generation has been pushed to the limit and 
> beyond. (I'm talking about classes like Query, SQLCompiler). It has 
> grown and grown, so that the Query class is now a 2000 line behemoth, 
> containing a constructor with over 40 assignments to 'self'. 
>
> Most of the core developers are scared to touch this stuff, AFAICS, 
> myself included. It has no virtually no unit tests. While it has very 
> high test coverage, it is tested only by tests that check QuerySet and 
> other high-level APIs. 
>
> As such, it's very difficult to change, and it may well be beyond our 
> ability to successfully refactor. 
>
> Switching to SQLAlchemy would force us to rewrite this code, which is 
> for our own good. In addition, large chunks of it can be dropped 
> entirely (i.e. most of database specific stuff). This will reduce our 
> maintenance load going forward (eventually). 
>
> (BTW, I'm not saying that we should let the existing code continue to 
> rot, we should of course try to clean it up as best we can, and that 
> effort is not wasted - I'm talking about a longer term strategy here. 
> If we can refactor this code, great - this motivation can be dropped 
> from the list, but I think the others still stand). 
>
> == Motivation 4: Python community 
>
> Having another project as a dependency would be a good thing. It would 
> prevent Django becoming a walled garden, and give us greater exposure to 
> the broader talent in Python-world. SQLAlchemy itself is a fantastic 
> library, from which we can learn a lot. 
>
> === Cost/risk for us === 
>
> There is no doubt this would require a lot of work, but I think it is 
> achievable. Both SQLAlchemy and Django have connection objects that 
> 'mostly' implement the DBAPI standard, which will help with the database 
> connection stuff. 
>
> Also, *not* doing this may represent a greater cost for us and our users 
> in the long run. There is no doubt that this layer of our code 
> represents a large amount of technical debt that we need to deal with at 
> some point. 
>
> We may be able to re-use a lot of the work that has already been done in 
> django-sqlalchemy [4], which is similar to this proposal in some ways, 
> or at least has done some of mapping. 
>
> It's possible that we won't be able to remove as much code as I'm 
> suggesting, leading to us having our own database-compatibility layer 
> on top of SQLA's, which might make our code ugly and harder to maintain 
> than doing it all ourselves. 
>
> For example, Oracle doesn't support LIMIT/OFFSET, and we have our own 
> workaround. SQLA also has a workaround that works transparently (I think 
> it is basically the same SQL), so we can delete all our code for this. 
> However, if there is a workaround we've got that SQLA doesn't have, or 
> doesn't apply transparently, or works differently to ours in an 
> important way, we might still need our own 'dialect' classes.  It may be 
> possible to get SQLA to accept patches for this in some cases, or use 
> SQLA's extension points to modify it's behaviour e.g. [5]. Mike Bayer 
> would definitely work with us on this (I've emailed him about it). 
>
> Finally, this change would probably require us to start helping out with 
> SQLAlchemy, at the very least investing enough to be able to effectively 
> contribute patches etc. 
>
>
> === Implementation === 
>
> (You can skip this section, it's just rough ideas I've come up with in 
> trying to assess the amount of work needed). 
>
> There are different strategies I can think of to break it down into 
> steps that will leave Django roughly working after each major step. Here 
> are some: 
>
> == Strategy 1: 
>
> The most direct, and in some ways the simplest, but requires a few huge 
> jumps. 
>
> 1) Replace all our connection objects with SQLAlchemy's. Each individual 
> backend (sqlite, postgres etc) would return SQLA connection objects. 
> All SQL generation would still be done in Django. 
>
> 2) Replace DDL SQL generation with SQLA Core. 
>
> 3) Replace DML SQL generation with SQLA Core. 
>
> 4) Replace backend/*/introspection.py 
>
> == Strategy 2: 
>
> 1) Rewrite one of our backends (probably SQLite) so that it uses SQLA to 
> connect to SQLite. SQLite is a good choice because we tend to use it for 
> running the test suite fast, but not for production, maximising the 
> number of people who might be willing to try out this branch. 
>
> 2) Rewrite its DDL SQL generation so it uses SQLA Expression Language. 
>
> 3) Write a SQLCompiler class for SQLite, in the same way that we have a 
> MySQL/Oracle SQLCompiler, but rewriting *all* the methods, to use SQLA 
> Expression Language to build the queries. 
>
> 4) Allow backends to provide their own QuerySet, and give SQLite its 
> own. The implementation delegates to a new Query class that uses SQLA to 
> build query objects. 
>
> 5) One-by-one, migrate all the other backends so that they inherit from 
> the SQLite backend and classes, added fixes in subclasses where 
> necessary. Move the fixes down into the backend where possible. 
>
> 6) Eliminate the whole 'backend' concept, pulling up the SQLite 
> implementation as the only implementation, renaming it etc, and removing 
> our own connection wrappers. 
>
> (Steps 3 and 4 might need to be combined/reversed). 
>
> == Strategy 3: 
>
> Same as strategy 2 except start by creating a 'SQLAlchemy' backend 
> (which can connect to any of the DBs via configuration). It might be 
> more confusing as it involves an extra layer of indirection. However, it 
> might be a better strategy, because you could actually develop this in a 
> branch that can be used in production, because just by a config change 
> you can use the new method. You could even merge this branch into master 
> regularly, which would encourage development. 
>
> Also, in this strategy, the first step - a SQLAlchemy backend that still 
> generates all queries the old way but has SQLAlchemy connection objects 
> - would be useful by itself. It would solve the problem of transactions 
> and a slow test suite for the case of a mixed Django/SQLA code base (see 
> above). 
>
> When the SQLAlchemy backend can fully pass the test suite for all 
> different DBs, and is using all SLQAlchemy stuff for SQL generation, it 
> will become the only backend, and the extra layer of indirection can be 
> removed, along with all the old code we are no longer using. 
>
>
> In all strategies, I haven't really thought about the contrib.gis and 
> the backends needed there. There are some SQLAlchemy libraries with GIS 
> extensions - e.g. GeoAlchemy [6] 
>
>
> === Negative impact for users === 
>
> There would be backwards incompatibilities for anyone using internals, 
> although quite a lot of them could be kept to a minimum with some shims 
> (probably temporary shims, which would be removed eventually). Some 
> documented APIs may be problematic e.g. QuerySet.extra(). These already 
> have various bugs, and it is possible that the re-write may expose the 
> bugs as being symptoms of fundamental flaws, rather than things we can 
> fix. We probably want to drop extra() for 2.0 anyway. 
>
> The level of incompatibilities, and the need for external dependency, 
> make me think that this is best suited for Django 2.0, 
>
>
> === Further work or alternatives === 
>
> This may or may not lead to adopting the SQLAlchemy ORM. This proposal 
> makes sense with or without that. That would be a much bigger change, 
> because it would change a lot publicly documented and highly used APIs, 
> and would introduce more subtle semantic changes due to things like 
> SQLA's Unit-of-Work etc. 
>
> I've emailed Mike Bayer about this proposal, and he is certainly happy 
> with the idea in general. He would want us to consider actually an 
> alternative, which is to use the SQLAlchemy ORM, not just Core, with 
> some backwards compatible wrappers for things like QuerySet. I wouldn't 
> be against that proposal, as I've heard great things about the SQLA ORM, 
> but I'm not in a position to recommend it myself, and it does introduce 
> some major semantic changes, even with a compatibility shim for syntax. 
>
> Another change this might lead to is getting rid of Q objects and F 
> objects, which are a poor man's way of referring to chunks of SQL, and 
> can at least be better implemented using SQLA's Expression Language, and 
> possibly removed altogether. 
>
> Finally, I should say thanks to Mike Bayer for the time he took just 
> looking at this proposal, and for his feedback, some of which I've 
> incorporated into this. 
>
>
> Comments? It would be good to get other developers' comments on the 
> whole idea of "Django 2.0" as well - when do we see it happening? 
>
>
> Regards, 
>
> Luke 
>
>
> [1] http://goo.gl/uKQe9 
> [2] http://goo.gl/gJP15 
> [3] http://goo.gl/NVRdw 
> [4] http://gitorious.org/django-sqlalchemy/ 
> [5] http://goo.gl/4Vhlb 
> [6] http://geoalchemy.org/index.html 
>
> -- 
> "Outside of a dog, a book is a man's best friend... inside of a 
> dog, it's too dark to read." 
>
> Luke Plant || http://lukeplant.me.uk/ 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/2a0ec504-8a4a-4655-9994-c0f8b64155cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Proposal: use SQLAlchemy Core for query generation

Reply via email to