Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

herzaso Mon, 02 Sep 2013 05:20:10 -0700

I do have it set as REPEATABLE READ.
However, I don't use transactions in sqlalchemy


On Monday, September 2, 2013 3:08:58 PM UTC+3, Simon King wrote:
>
> Do you know what transaction isolation level you are running at? The 
> default apparently is "REPEATABLE READ": 
>
>   
> http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
>  
>
> The important sentence in that link is: 
>
>   All consistent reads within the same transaction read the snapshot 
> established by the first read 
>
> When you query the database for the first time, to see if the entity 
> already exists, you are setting that initial snapshot. If you run the 
> same query again (such as in your exception handler), you will get the 
> same results, whether or not another connection has inserted a 
> matching row in the meantime. 
>
> Simon 
>
> On Mon, Sep 2, 2013 at 12:54 PM, herzaso <her...@gmail.com <javascript:>> 
> wrote: 
> > I'm not sure what to make of the results: 
> > On the first connection, I ran BEGIN and INSERT and both were 
> successful, 
> > but when I tried the INSERT statement on the second connection, I got 
> "ERROR 
> > 1205 (HY000): Lock wait timeout exceeded; try restarting transaction". 
> > Running the same query on the first connection produced the required 
> result 
> > which is "ERROR 1062 (23000): Duplicate entry" 
> > After the ROLLBACK on the first connection, the INSERT statement worked 
> well 
> > on the second connection 
> > 
> > Regarding your second remark, the answer is yes, the error was due to 
> the 
> > unique constraint on those columns 
> > 
> > BTW: I'm working on MySQL 
> > 
> > On Monday, September 2, 2013 1:31:12 PM UTC+3, Simon King wrote: 
> >> 
> >> I don't really know the answer, but I'd be interested in the results 
> >> of this experiment: 
> >> 
> >> Forget about SQLAlchemy for the moment, and start 2 plain SQL 
> >> connections to your database. In the first, type something like the 
> >> following: 
> >> 
> >> BEGIN; 
> >> INSERT foo(bar, baz, qux) VALUES(1, 1, 1); 
> >> 
> >> Now in the second connection do the same. I assume it'll fail because 
> >> of the duplicate values. 
> >> 
> >> Now in the first connection issue a "ROLLBACK". You should now be in a 
> >> state where no matching row exists in the database, even though you 
> >> received an error about constraint violations. 
> >> 
> >> The results you see may be different, depending on your transaction 
> >> isolation level. (It may be that you don't get the constraint 
> >> violation at all until you try to commit the second connection). 
> >> 
> >> Another thing you could look at: are you sure that the error you are 
> >> getting is due to the unique constraint on bar/baz/qux, and not some 
> >> other constraint in the database? 
> >> 
> >> Simon 
> >> 
> >> On Mon, Sep 2, 2013 at 8:45 AM, herzaso <her...@gmail.com> wrote: 
> >> > I'm afraid it didn't solve my problem. 
> >> > 
> >> > Here is my updated method: 
> >> >     @classmethod 
> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
> >> >         query = session.query(cls).\ 
> >> >             filter(cls.bar == bar).\ 
> >> >             filter(cls.baz == baz).\ 
> >> >             filter(cls.qux == qux) 
> >> > 
> >> >         item = query.first() 
> >> >         updated = False 
> >> > 
> >> >         if not item: 
> >> >             try: 
> >> >                 with session.begin_nested():   # run inside a 
> SAVEPOINT 
> >> >                     updated = True 
> >> >                     item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
> >> >                     session.add(item) 
> >> >                     session.flush() 
> >> >             except sa.exc.IntegrityError: 
> >> >                 item = query.first() 
> >> >                 if not item: 
> >> >                     raise Exception("invalidIntegrityError") 
> >> >             except: 
> >> >                 raise 
> >> > 
> >> >         if not updated: 
> >> >             for k, v in kwargs.iteritems(): 
> >> >                 if getattr(item, k) != v: 
> >> >                     setattr(item, k, v) 
> >> > 
> >> >         return item 
> >> > 
> >> > With this code, i'm getting invalidIntegrityError. How is it 
> possible? 
> >> > (it's also worth pointing out that this solution requires SA 0.8.2 
> >> > (otherwise, there is a problem with session.begin_nested) 
> >> > 
> >> > 
> >> > On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote: 
> >> >> 
> >> >> I'm not a fan of catching integrity errors, i prefer to try to make 
> >> >> sure 
> >> >> they aren't going to happen, or if they are, they aren't a normal 
> >> >> occurrence 
> >> >> and the system is such that the particular operation can just fail 
> (of 
> >> >> course it depends on what it is).     A problem with catching the 
> >> >> integrity 
> >> >> error due to concurrent, conflicting operations is that depending on 
> >> >> backend 
> >> >> and isolation level, you can't be totally sure when the error is 
> going 
> >> >> to 
> >> >> get raised (e.g. serializable isolation vs. non).  Also on a backend 
> >> >> like 
> >> >> Postgresql, the database can't recover the transaction after an 
> >> >> integrity 
> >> >> error unless you used a savepoint. 
> >> >> 
> >> >> But here you're doing the "concurrent transactions need row identity 
> >> >> X", 
> >> >> so maybe it is appropriate here.  Here is a rough idea of a 
> >> >> transactional 
> >> >> pattern for that, noting this isn't tested: 
> >> >> 
> >> >> try: 
> >> >>     my_object = Session.query(MyClass).filter(....).one() 
> >> >> except NoResultFound: 
> >> >>     try: 
> >> >>         with Session.begin_nested():   # run inside a SAVEPOINT 
> >> >>             my_object = MyClass(...) 
> >> >>             Session.add(my_object) 
> >> >>             Session.flush() 
> >> >>     except IntegrityError: 
> >> >>         my_object = Session.query(MyClass).filter(....).one() 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> wrote: 
> >> >> 
> >> >> Suppose we are looking at a race condition, do you also think this 
> >> >> should 
> >> >> be handled by catching the IntegrityError? 
> >> >> If so, what should I do? only flush and do the operation again? 
> >> >> 
> >> >> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote: 
> >> >>> 
> >> >>> the word "occasional" is very meaningful.  It usually suggests race 
> >> >>> conditions.    Then with the word "tornado", the baysean filters 
> are 
> >> >>> strongly leaning towards "race condition" at that point :). 
> >> >>> 
> >> >>> if an error is occurring only under volume then you have to revisit 
> >> >>> where 
> >> >>> race conditions can occur. 
> >> >>> 
> >> >>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote: 
> >> >>> 
> >> >>> I'm running a Tornado server without redundancy (only one process, 
> >> >>> requests can arrive at the same time but will be handled one at a 
> >> >>> time) 
> >> >>> I do agree that for large volumes, catching the IntegrityError 
> would 
> >> >>> be 
> >> >>> better, but currently I am handling a single request at a time and 
> I 
> >> >>> want to 
> >> >>> fix this problem before I move on ... 
> >> >>> 
> >> >>> 
> >> >>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote: 
> >> >>>> 
> >> >>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> 
> wrote: 
> >> >>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: 
> >> >>>> >> 
> >> >>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> 
> wrote: 
> >> >>>> >> > I have a model with an ID column set as the primary key, 
> though 
> >> >>>> >> > i'd 
> >> >>>> >> > like 
> >> >>>> >> > to 
> >> >>>> >> > be able to identify records by 3 other columns. 
> >> >>>> >> > For this situation, I've added a classmethod that will fetch 
> the 
> >> >>>> >> > record 
> >> >>>> >> > if 
> >> >>>> >> > found or a new record if not. 
> >> >>>> >> > The problem i'm having is that every once in a while, I get 
> >> >>>> >> > IntegrityError 
> >> >>>> >> > trying to flush a change 
> >> >>>> >> > 
> >> >>>> >> > class Foo(Base): 
> >> >>>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 
> 'qux'),) 
> >> >>>> >> > 
> >> >>>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
> >> >>>> >> > __tablename__), 
> >> >>>> >> > nullable=False, primary_key=True) 
> >> >>>> >> >     bar = sa.Column(sa.BigInteger) 
> >> >>>> >> >     baz = sa.Column(sa.BigInteger) 
> >> >>>> >> >     qux = sa.Column(sa.BigInteger) 
> >> >>>> >> >     a1 = sa.Column(sa.BigInteger) 
> >> >>>> >> >     a2 = sa.Column(sa.BigInteger) 
> >> >>>> >> > 
> >> >>>> >> >     @classmethod 
> >> >>>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
> >> >>>> >> >         item = session.query(cls).\ 
> >> >>>> >> >             filter(cls.bar== bar).\ 
> >> >>>> >> >             filter(cls.baz == baz).\ 
> >> >>>> >> >             filter(cls.qux == qux).\ 
> >> >>>> >> >             first() 
> >> >>>> >> > 
> >> >>>> >> >         if item: 
> >> >>>> >> >             for k, v in kwargs.iteritems(): 
> >> >>>> >> >                 if getattr(item, k) != v: 
> >> >>>> >> >                     setattr(item, k, v) 
> >> >>>> >> >         else: 
> >> >>>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
> >> >>>> >> > 
> >> >>>> >> >         return item 
> >> >>>> >> > 
> >> >>>> >> > This is the code I use to add/update records: 
> >> >>>> >> > 
> >> >>>> >> > foo = Foo.get(**item) 
> >> >>>> >> > session.merge(foo) 
> >> >>>> >> > 
> >> >>>> >> > I'm struggling with this problem for some time now, and would 
> >> >>>> >> > appreciate 
> >> >>>> >> > any 
> >> >>>> >> > help ... 
> >> >>>> >> > 
> >> >>>> >> 
> >> >>>> >> I'm not sure of the exact problem, but there are a couple of 
> >> >>>> >> things 
> >> >>>> >> that you could investigate. 
> >> >>>> >> 
> >> >>>> >> Firstly, session.merge returns a copy of the object, rather 
> than 
> >> >>>> >> adding the object that you supplied into the session. See 
> >> >>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#mergingfor 
> >> >>>> >> details. 
> >> >>>> >> 
> >> >>>> >> Secondly, your "get" method sometimes returns objects that are 
> >> >>>> >> already 
> >> >>>> >> part of the session (if they were in the database), and 
> sometimes 
> >> >>>> >> objects that are not in the session. It would probably be more 
> >> >>>> >> consistent to always return objects that are part of the 
> session, 
> >> >>>> >> by 
> >> >>>> >> putting "session.add(item)" in your "else" clause. This would 
> get 
> >> >>>> >> rid 
> >> >>>> >> of the need for session.merge(). (If you want to be able to use 
> >> >>>> >> the 
> >> >>>> >> "get" with non-global sessions, pass the session as a 
> parameter.) 
> >> >>>> >> 
> >> >>>> >> Finally, if your session isn't auto-flushing, it would be 
> possible 
> >> >>>> >> for 
> >> >>>> >> you to call "get" twice with the same parameters and get 2 
> >> >>>> >> different 
> >> >>>> >> objects back. 
> >> >>>> >> 
> >> >>>> >> You may want to look at the UniqueObject recipe in the wiki: 
> >> >>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
> >> >>>> >> 
> >> >>>> > Hi Simon, 
> >> >>>> > Thanks for the fast reply. 
> >> >>>> > 
> >> >>>> > I tried adding session.add(item) and session.flush() in the else 
> >> >>>> > clause in 
> >> >>>> > the past but that didn't solve my problem. 
> >> >>>> > I didn't however remove the merge, do you think that might be 
> the 
> >> >>>> > problem? 
> >> >>>> > 
> >> >>>> > Regarding the flush, this code is part of an API server where a 
> >> >>>> > scoped_session is committed after each change. I haven't changed 
> >> >>>> > the 
> >> >>>> > autoflush parameter, and as I understand the default value is 
> True 
> >> >>>> > making a 
> >> >>>> > flush before each commit or query. 
> >> >>>> > 
> >> >>>> > As for the UniqueObject recipe, thanks! Amazing that I never 
> found 
> >> >>>> > it 
> >> >>>> > searching for a cure. As I see it basically does the same ... 
> >> >>>> > 
> >> >>>> > I never managed to reproduce this bug on my development 
> >> >>>> > environment. 
> >> >>>> > It only 
> >> >>>> > happens in my production environment. 
> >> >>>> > Do you suppose adding a session.add and removing the merge will 
> >> >>>> > solve 
> >> >>>> > this 
> >> >>>> > issue? 
> >> >>>> > 
> >> >>>> > Thanks, 
> >> >>>> > Ofir 
> >> >>>> 
> >> >>>> It's difficult to say without knowing more about your system. For 
> >> >>>> example, does your production system get multiple concurrent API 
> >> >>>> requests, or are they serialised? If 2 requests can come in at 
> >> >>>> approximately the same time and are handled by 2 different threads 
> >> >>>> (or 
> >> >>>> processes), then it is easy to imagine that the first handler will 
> >> >>>> check the database, find that an entry doesn't exist, and create 
> it. 
> >> >>>> But before it flushes the change to the database (or even after it 
> >> >>>> flushes, but before it commits, depending on your transaction 
> >> >>>> isolation), the second handler will check for the same object, 
> find 
> >> >>>> it 
> >> >>>> missing, and so create it. 
> >> >>>> 
> >> >>>> To track down problems like this, you could ensure that your 
> >> >>>> development environment has the same thread/process behaviour as 
> the 
> >> >>>> production environment, then try submitting multiple concurrent 
> >> >>>> requests to it. If you add "time.sleep" statements somewhere 
> between 
> >> >>>> the creation of the object and the commit of the transaction you 
> will 
> >> >>>> probably find it easier to trigger. 
> >> >>>> 
> >> >>>> To actually fix the problem, you could choose to only handle a 
> single 
> >> >>>> request at a time (fine if you don't expect a high volume of 
> >> >>>> requests). If that's not acceptable, you could catch the 
> >> >>>> IntegrityError and then re-process the request. 
> >> >>>> 
> >> >>>> Hope that helps, 
> >> >>>> 
> >> >>>> Simon 
> >> >>> 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to