Re: [sqlalchemy] keeping long filter statements easily readable?
On Sun, May 27, 2012 at 4:37 PM, Julien Lacroix ne...@aradriel.de wrote: I've stumbled across someof my old query and got certain problems to read my bunch of where statements. The editors word wrap function makes things just worse. whats your advance to keep long filter statements readable, one where statement per line would be great. e.g.: obj = dbObj.query(table).filter(table.foo1 == False AND include more filter statements here).order_by(table.foo2).all() I tend to write my SA queries like this: obj = (session.query(dbObj) .filter(dbObj.foo1 == False) .filter(sa.or_(dbObj.foo2 == 'abc', dbObj.foo3 == 'def')) .order_by(dbObj.foo2) .all()) Remember multiple filter clauses are automatically AND-ed together. Simon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Session management with mod_wsgi and webapp2
On Mon, May 28, 2012 at 12:58 AM, Bradley Mclain bradley.james.mcl...@gmail.com wrote: On Saturday, May 26, 2012 9:31:55 AM UTC+10, Michael Bayer wrote: On May 24, 2012, at 11:28 PM, Bradley Mclain wrote: Hi, Currently working on a project that use SQL alchemy with mod_wsgi, webapp2 and prestans (python REST framework). The backend is Oracle XE using cx_Oracle. What is the appropriate way to manage sessions in a multithreading app such as this? Currently I have been following the guide found here to implement a contextual/thread-local session. scoped_session produces a session factory/proxy object which maintains a single Session per thread, and as the guide indicates you can allow this factory to automatically create a Session on first use, then at request end time you do an unconditional remove() so that the scoped_session closes the Session it may or may not be handling, and disposes of it. I have then made a session that is automatically cleaned up by the request handler. The code for this looks something like: def dispatch(self): self.db_session = db.Session() super(BaseHandler, self).dispatch() self.db_session.close() db.Session.remove() db.Session.remove() should be enough here, it calls close() for you. However this doesn't seem to clean up all database connections as viewed in netstat The Session uses an Engine as a source of connections. The Engine is introduced at http://docs.sqlalchemy.org/en/rel_0_7/core/engines.html . Intrinsic to its behavior is that it uses a small connection pool which by default holds onto five connections out of the total accumulated. Full information on how this pooling is configured or disabled is at http://docs.sqlalchemy.org/en/rel_0_7/core/pooling.html. and I get a connections exhausted error from Oracle XE after a certain time of using the application. The same thing also happens when I use MySQL as well. one of the many options here is the pool_recycle option which places a time limit on the age of connections. If I add a call to db.engine.dispose() after the handler has returned this appears to fix the problem but the SQL Alchemy documentation indicates that this is only for usage with dropped connections or in test suites. I'm glad the documentation made clear that dispose() is not generally appropriate, as the Engine is a factory/registry for connections, not a connection itself. Thanks for confirming my apprehensions about using engine.dispose(). So I gather it is just a matter of me tweaking the pool_recycle and pool_size options to something more appropriate for my application? Can you offer any guidance as to some appropriate values for a threaded REST application? Would pool_size be able to be quite small like 2 and pool_recycle something like 2 mins? How many connections are alive when you get the connections exhausted error? Is it more than pool_size + max_overflow (default 15)? Also, are you using multiple processes with mod_wsgi? If so, remember that each process will have its own pool, so if you had 2 processes, you could have up to 30 connections open. Simon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Lock table, do things to table, unlock table: Best way?
The unique constraint sounds like a workable solution! I'll implement that with a try/except and report back if that was effective. Thanks! On May 28, 5:43 am, Simon King si...@simonking.org.uk wrote: On Sun, May 27, 2012 at 6:18 PM, Jeff jeffalst...@gmail.com wrote: Thanks, I have indeed spent a lot of time looking at SELECT FOR UPDATE, but as far as I can tell that locks rows that have been selected. That is not helpful in this use case, in which the issue is rows not existing, and then later existing. Am I misunderstanding? On May 27, 11:48 am, A.M. age...@themactionfaction.com wrote: On May 27, 2012, at 1:07 AM, Jeff wrote: I have multiple processes accessing a table. All of these processes want to read a set of rows from the table, and if the rows are not present they will make a calculation and insert the rows themselves. The issue comes where process A does a query to see if the target set of rows is present in the table, and they're not, and then another starts calculating. While it's calculating, process B inserts the rows. Then process A inserts the rows, and now we have two copies of these sets of rows. Bad. You should look at SELECT FOR UPDATE. http://docs.sqlalchemy.org/en/rel_0_7/orm/query.html?highlight=lockmo... Cheers, M Could you put unique constraints on the table so that the second process will get an error when it tries to insert the duplicate rows? It won't prevent you from performing the calculations twice, but at least you won't get the duplicates. Another option would be to write some sort of pending marker into the table, so that subsequent processes know that the result is already being calculated. Simon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Multiple inserts with .append()'d associations
There are Analyses, which have Events. There's an Analysis_Event_Association table. Events are written before the Analyses. Later, I have a for loop calculating multiple Analyses. For each Analysis, we identify the Events it includes and append them, using sqlalchemy's relationship plumbing. This just writes the appropriate row to the association table. It works great: all_events = list(session.query(Event).filter().order_by(...).all()) for a in analyses_parameters: analysis = Analysis() events_in_this_analysis = all_events[...] analysis.events = events_in_this_analysis session.append(analysis) session.commit() session.close() session.bind.dispose() The issue is that I'm in a situation where there are many, many processes writing to the same database, and the database only allows us a limited number of connections. So I'm trying to have connections open only when they're actually being used, hence the session.bind.dispose(). Writing all the analyses is a single insert, and thus is efficient with regards to having a connection open. HOWEVER, it appears that each of the Events associations we appended are being inserted individually, which is bad. The connection sits there, then is used, then sits there, then is used, etc. This contributes to the maximum connections open, which is a limited resource. I'd like all the appended Events association to be inserted in one go. Is there a way to do this? Am I correct about what's going on? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Lock table, do things to table, unlock table: Best way?
An option to add along to the unique constraint, if you expect to get collisions often, is to use a SAVEPOINT so that a process can roll back partially if this particular INSERT fails, then use the row. The Session offers SAVEPOINT via begin_nested(): session.begin_nested() try: session.add(thing_that_may_exist_already) session.commit() # flushes, and commits only the savepoint except exc.IntegrityError: session.rollback() thing_that_may_exist_already = session.query(Thing).filter_by(criteiron).one() the difference between using locks to prevent concurrent dupes versus using constraints and expecting dupes to fail is known as pessimistic versus optimistic locking. On May 28, 2012, at 10:38 AM, Jeff wrote: The unique constraint sounds like a workable solution! I'll implement that with a try/except and report back if that was effective. Thanks! On May 28, 5:43 am, Simon King si...@simonking.org.uk wrote: On Sun, May 27, 2012 at 6:18 PM, Jeff jeffalst...@gmail.com wrote: Thanks, I have indeed spent a lot of time looking at SELECT FOR UPDATE, but as far as I can tell that locks rows that have been selected. That is not helpful in this use case, in which the issue is rows not existing, and then later existing. Am I misunderstanding? On May 27, 11:48 am, A.M. age...@themactionfaction.com wrote: On May 27, 2012, at 1:07 AM, Jeff wrote: I have multiple processes accessing a table. All of these processes want to read a set of rows from the table, and if the rows are not present they will make a calculation and insert the rows themselves. The issue comes where process A does a query to see if the target set of rows is present in the table, and they're not, and then another starts calculating. While it's calculating, process B inserts the rows. Then process A inserts the rows, and now we have two copies of these sets of rows. Bad. You should look at SELECT FOR UPDATE. http://docs.sqlalchemy.org/en/rel_0_7/orm/query.html?highlight=lockmo... Cheers, M Could you put unique constraints on the table so that the second process will get an error when it tries to insert the duplicate rows? It won't prevent you from performing the calculations twice, but at least you won't get the duplicates. Another option would be to write some sort of pending marker into the table, so that subsequent processes know that the result is already being calculated. Simon -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Multiple inserts with .append()'d associations
On May 28, 2012, at 10:52 AM, Jeff wrote: There are Analyses, which have Events. There's an Analysis_Event_Association table. Events are written before the Analyses. Later, I have a for loop calculating multiple Analyses. For each Analysis, we identify the Events it includes and append them, using sqlalchemy's relationship plumbing. This just writes the appropriate row to the association table. It works great: all_events = list(session.query(Event).filter().order_by(...).all()) for a in analyses_parameters: analysis = Analysis() events_in_this_analysis = all_events[...] analysis.events = events_in_this_analysis session.append(analysis) session.commit() session.close() session.bind.dispose() The issue is that I'm in a situation where there are many, many processes writing to the same database, and the database only allows us a limited number of connections. So I'm trying to have connections open only when they're actually being used, hence the session.bind.dispose(). engine.dispose() is an inefficient operation, as the Engine is a factory for connections, not a connection itself. If you'd like there to be zero actual database connections open when the application is idle, you can disable pooling using NullPool. Writing all the analyses is a single insert, and thus is efficient with regards to having a connection open. HOWEVER, it appears that each of the Events associations we appended are being inserted individually, which is bad. The connection sits there, then is used, then sits there, then is used, etc. This contributes to the maximum connections open, which is a limited resource. Just to make sure we're on the same page, the Session procures a single DBAPI connection, and uses it for the lifespan of that transaction. So the above code is using only one connection until COMMIT is called. Unless there is a significant period of latency in the above loop, it would be way less efficient and also defeat transactional guarantees to use a new connection for each INSERT. If your app is using MyISAM tables then the transactional guarantees are moot in any case, though. But breaking up a tight loop to repeatedly connect over and over again for each record being inserted, in the effort to use fewer concurrent connections, is very unlikely to help the situation as you're adding lots more work and time spent to the operation. I'd like all the appended Events association to be inserted in one go. Is there a way to do this? Am I correct about what's going on? SQLAlchemy 0.7 will also perform the INSERT of all the above records as a single prepared statement if you assign a primary key value to the objects ahead of time; the ORM no longer needs to fetch the newly inserted primary key in this case, so you'd see a single statement sent to the DBAPI, and the DBAPI then executes the statement in a more efficient manner for each set of parameters (typically using prepared statement mechanics). It still ultimately emits individual INSERT statements over the wire but just at a lower, more performant level on the Python side. Or, you can get the same effect by inserting the records using a non-ORM executemany, where in this case you wouldn't need to pre-assign primary key values: Session.execute(Event.__table__.insert(), params=[{a:evt.a, b:evt.b, ...} for evt in events}] the disadvantage to the above is that the Session doesn't know about these newly inserted Event objects. You'd need to make sure you don't add equivalent records to the Session itself or they will conflict, and if you want to work with these Event objects in the Session immediately subsequent to the above operation you'd need to load them in (as that would be the only way to get at their newly generated primary key values). -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.