I do have it set as REPEATABLE READ. However, I don't use transactions in sqlalchemy
On Monday, September 2, 2013 3:08:58 PM UTC+3, Simon King wrote: > > Do you know what transaction isolation level you are running at? The > default apparently is "REPEATABLE READ": > > > http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read > > > The important sentence in that link is: > > All consistent reads within the same transaction read the snapshot > established by the first read > > When you query the database for the first time, to see if the entity > already exists, you are setting that initial snapshot. If you run the > same query again (such as in your exception handler), you will get the > same results, whether or not another connection has inserted a > matching row in the meantime. > > Simon > > On Mon, Sep 2, 2013 at 12:54 PM, herzaso <her...@gmail.com <javascript:>> > wrote: > > I'm not sure what to make of the results: > > On the first connection, I ran BEGIN and INSERT and both were > successful, > > but when I tried the INSERT statement on the second connection, I got > "ERROR > > 1205 (HY000): Lock wait timeout exceeded; try restarting transaction". > > Running the same query on the first connection produced the required > result > > which is "ERROR 1062 (23000): Duplicate entry" > > After the ROLLBACK on the first connection, the INSERT statement worked > well > > on the second connection > > > > Regarding your second remark, the answer is yes, the error was due to > the > > unique constraint on those columns > > > > BTW: I'm working on MySQL > > > > On Monday, September 2, 2013 1:31:12 PM UTC+3, Simon King wrote: > >> > >> I don't really know the answer, but I'd be interested in the results > >> of this experiment: > >> > >> Forget about SQLAlchemy for the moment, and start 2 plain SQL > >> connections to your database. In the first, type something like the > >> following: > >> > >> BEGIN; > >> INSERT foo(bar, baz, qux) VALUES(1, 1, 1); > >> > >> Now in the second connection do the same. I assume it'll fail because > >> of the duplicate values. > >> > >> Now in the first connection issue a "ROLLBACK". You should now be in a > >> state where no matching row exists in the database, even though you > >> received an error about constraint violations. > >> > >> The results you see may be different, depending on your transaction > >> isolation level. (It may be that you don't get the constraint > >> violation at all until you try to commit the second connection). > >> > >> Another thing you could look at: are you sure that the error you are > >> getting is due to the unique constraint on bar/baz/qux, and not some > >> other constraint in the database? > >> > >> Simon > >> > >> On Mon, Sep 2, 2013 at 8:45 AM, herzaso <her...@gmail.com> wrote: > >> > I'm afraid it didn't solve my problem. > >> > > >> > Here is my updated method: > >> > @classmethod > >> > def get(cls, bar=None, baz=None, qux=None, **kwargs): > >> > query = session.query(cls).\ > >> > filter(cls.bar == bar).\ > >> > filter(cls.baz == baz).\ > >> > filter(cls.qux == qux) > >> > > >> > item = query.first() > >> > updated = False > >> > > >> > if not item: > >> > try: > >> > with session.begin_nested(): # run inside a > SAVEPOINT > >> > updated = True > >> > item = cls(bar=bar, baz=baz, qux=qux, **kwargs) > >> > session.add(item) > >> > session.flush() > >> > except sa.exc.IntegrityError: > >> > item = query.first() > >> > if not item: > >> > raise Exception("invalidIntegrityError") > >> > except: > >> > raise > >> > > >> > if not updated: > >> > for k, v in kwargs.iteritems(): > >> > if getattr(item, k) != v: > >> > setattr(item, k, v) > >> > > >> > return item > >> > > >> > With this code, i'm getting invalidIntegrityError. How is it > possible? > >> > (it's also worth pointing out that this solution requires SA 0.8.2 > >> > (otherwise, there is a problem with session.begin_nested) > >> > > >> > > >> > On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote: > >> >> > >> >> I'm not a fan of catching integrity errors, i prefer to try to make > >> >> sure > >> >> they aren't going to happen, or if they are, they aren't a normal > >> >> occurrence > >> >> and the system is such that the particular operation can just fail > (of > >> >> course it depends on what it is). A problem with catching the > >> >> integrity > >> >> error due to concurrent, conflicting operations is that depending on > >> >> backend > >> >> and isolation level, you can't be totally sure when the error is > going > >> >> to > >> >> get raised (e.g. serializable isolation vs. non). Also on a backend > >> >> like > >> >> Postgresql, the database can't recover the transaction after an > >> >> integrity > >> >> error unless you used a savepoint. > >> >> > >> >> But here you're doing the "concurrent transactions need row identity > >> >> X", > >> >> so maybe it is appropriate here. Here is a rough idea of a > >> >> transactional > >> >> pattern for that, noting this isn't tested: > >> >> > >> >> try: > >> >> my_object = Session.query(MyClass).filter(....).one() > >> >> except NoResultFound: > >> >> try: > >> >> with Session.begin_nested(): # run inside a SAVEPOINT > >> >> my_object = MyClass(...) > >> >> Session.add(my_object) > >> >> Session.flush() > >> >> except IntegrityError: > >> >> my_object = Session.query(MyClass).filter(....).one() > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> wrote: > >> >> > >> >> Suppose we are looking at a race condition, do you also think this > >> >> should > >> >> be handled by catching the IntegrityError? > >> >> If so, what should I do? only flush and do the operation again? > >> >> > >> >> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote: > >> >>> > >> >>> the word "occasional" is very meaningful. It usually suggests race > >> >>> conditions. Then with the word "tornado", the baysean filters > are > >> >>> strongly leaning towards "race condition" at that point :). > >> >>> > >> >>> if an error is occurring only under volume then you have to revisit > >> >>> where > >> >>> race conditions can occur. > >> >>> > >> >>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote: > >> >>> > >> >>> I'm running a Tornado server without redundancy (only one process, > >> >>> requests can arrive at the same time but will be handled one at a > >> >>> time) > >> >>> I do agree that for large volumes, catching the IntegrityError > would > >> >>> be > >> >>> better, but currently I am handling a single request at a time and > I > >> >>> want to > >> >>> fix this problem before I move on ... > >> >>> > >> >>> > >> >>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote: > >> >>>> > >> >>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> > wrote: > >> >>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: > >> >>>> >> > >> >>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> > wrote: > >> >>>> >> > I have a model with an ID column set as the primary key, > though > >> >>>> >> > i'd > >> >>>> >> > like > >> >>>> >> > to > >> >>>> >> > be able to identify records by 3 other columns. > >> >>>> >> > For this situation, I've added a classmethod that will fetch > the > >> >>>> >> > record > >> >>>> >> > if > >> >>>> >> > found or a new record if not. > >> >>>> >> > The problem i'm having is that every once in a while, I get > >> >>>> >> > IntegrityError > >> >>>> >> > trying to flush a change > >> >>>> >> > > >> >>>> >> > class Foo(Base): > >> >>>> >> > __table_args__ = (sa.UniqueConstraint('bar', 'baz', > 'qux'),) > >> >>>> >> > > >> >>>> >> > id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % > >> >>>> >> > __tablename__), > >> >>>> >> > nullable=False, primary_key=True) > >> >>>> >> > bar = sa.Column(sa.BigInteger) > >> >>>> >> > baz = sa.Column(sa.BigInteger) > >> >>>> >> > qux = sa.Column(sa.BigInteger) > >> >>>> >> > a1 = sa.Column(sa.BigInteger) > >> >>>> >> > a2 = sa.Column(sa.BigInteger) > >> >>>> >> > > >> >>>> >> > @classmethod > >> >>>> >> > def get(cls, bar=None, baz=None, qux=None, **kwargs): > >> >>>> >> > item = session.query(cls).\ > >> >>>> >> > filter(cls.bar== bar).\ > >> >>>> >> > filter(cls.baz == baz).\ > >> >>>> >> > filter(cls.qux == qux).\ > >> >>>> >> > first() > >> >>>> >> > > >> >>>> >> > if item: > >> >>>> >> > for k, v in kwargs.iteritems(): > >> >>>> >> > if getattr(item, k) != v: > >> >>>> >> > setattr(item, k, v) > >> >>>> >> > else: > >> >>>> >> > item = cls(bar=bar, baz=baz, qux=qux, **kwargs) > >> >>>> >> > > >> >>>> >> > return item > >> >>>> >> > > >> >>>> >> > This is the code I use to add/update records: > >> >>>> >> > > >> >>>> >> > foo = Foo.get(**item) > >> >>>> >> > session.merge(foo) > >> >>>> >> > > >> >>>> >> > I'm struggling with this problem for some time now, and would > >> >>>> >> > appreciate > >> >>>> >> > any > >> >>>> >> > help ... > >> >>>> >> > > >> >>>> >> > >> >>>> >> I'm not sure of the exact problem, but there are a couple of > >> >>>> >> things > >> >>>> >> that you could investigate. > >> >>>> >> > >> >>>> >> Firstly, session.merge returns a copy of the object, rather > than > >> >>>> >> adding the object that you supplied into the session. See > >> >>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#mergingfor > >> >>>> >> details. > >> >>>> >> > >> >>>> >> Secondly, your "get" method sometimes returns objects that are > >> >>>> >> already > >> >>>> >> part of the session (if they were in the database), and > sometimes > >> >>>> >> objects that are not in the session. It would probably be more > >> >>>> >> consistent to always return objects that are part of the > session, > >> >>>> >> by > >> >>>> >> putting "session.add(item)" in your "else" clause. This would > get > >> >>>> >> rid > >> >>>> >> of the need for session.merge(). (If you want to be able to use > >> >>>> >> the > >> >>>> >> "get" with non-global sessions, pass the session as a > parameter.) > >> >>>> >> > >> >>>> >> Finally, if your session isn't auto-flushing, it would be > possible > >> >>>> >> for > >> >>>> >> you to call "get" twice with the same parameters and get 2 > >> >>>> >> different > >> >>>> >> objects back. > >> >>>> >> > >> >>>> >> You may want to look at the UniqueObject recipe in the wiki: > >> >>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject > >> >>>> >> > >> >>>> > Hi Simon, > >> >>>> > Thanks for the fast reply. > >> >>>> > > >> >>>> > I tried adding session.add(item) and session.flush() in the else > >> >>>> > clause in > >> >>>> > the past but that didn't solve my problem. > >> >>>> > I didn't however remove the merge, do you think that might be > the > >> >>>> > problem? > >> >>>> > > >> >>>> > Regarding the flush, this code is part of an API server where a > >> >>>> > scoped_session is committed after each change. I haven't changed > >> >>>> > the > >> >>>> > autoflush parameter, and as I understand the default value is > True > >> >>>> > making a > >> >>>> > flush before each commit or query. > >> >>>> > > >> >>>> > As for the UniqueObject recipe, thanks! Amazing that I never > found > >> >>>> > it > >> >>>> > searching for a cure. As I see it basically does the same ... > >> >>>> > > >> >>>> > I never managed to reproduce this bug on my development > >> >>>> > environment. > >> >>>> > It only > >> >>>> > happens in my production environment. > >> >>>> > Do you suppose adding a session.add and removing the merge will > >> >>>> > solve > >> >>>> > this > >> >>>> > issue? > >> >>>> > > >> >>>> > Thanks, > >> >>>> > Ofir > >> >>>> > >> >>>> It's difficult to say without knowing more about your system. For > >> >>>> example, does your production system get multiple concurrent API > >> >>>> requests, or are they serialised? If 2 requests can come in at > >> >>>> approximately the same time and are handled by 2 different threads > >> >>>> (or > >> >>>> processes), then it is easy to imagine that the first handler will > >> >>>> check the database, find that an entry doesn't exist, and create > it. > >> >>>> But before it flushes the change to the database (or even after it > >> >>>> flushes, but before it commits, depending on your transaction > >> >>>> isolation), the second handler will check for the same object, > find > >> >>>> it > >> >>>> missing, and so create it. > >> >>>> > >> >>>> To track down problems like this, you could ensure that your > >> >>>> development environment has the same thread/process behaviour as > the > >> >>>> production environment, then try submitting multiple concurrent > >> >>>> requests to it. If you add "time.sleep" statements somewhere > between > >> >>>> the creation of the object and the commit of the transaction you > will > >> >>>> probably find it easier to trigger. > >> >>>> > >> >>>> To actually fix the problem, you could choose to only handle a > single > >> >>>> request at a time (fine if you don't expect a high volume of > >> >>>> requests). If that's not acceptable, you could catch the > >> >>>> IntegrityError and then re-process the request. > >> >>>> > >> >>>> Hope that helps, > >> >>>> > >> >>>> Simon > >> >>> > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.