Re: [sqlalchemy] Re: SQLAlchemy JTI subqueryload bug

mike bayer Mon, 19 Jun 2017 05:49:52 -0700

thanks! if they're deep "incorrect query" things in the ORM like this,I fix those really fast. Inconvenient things with Core / typing systemetc., not so much :)



On 06/19/2017 12:12 AM, Sherwin Yu wrote:

Mike, thanks for responding and fixing this so quickly(https://bitbucket.org/zzzeek/sqlalchemy/issues/4011/joined-subclass-to-2-level-subquery-load#comment-37650645)!

Just wanted to take a moment to express my gratitude for your work onSQLA -- it's absolutely critical to our company (Benchling). A few ofour other engineers have reported issues before and you've always beenquick to respond. Thank you!


On Wednesday, June 14, 2017 at 10:23:06 PM UTC-7, Sherwin Yu wrote:



    We found a bug involving filtering a JTI relation and
    subqueryloading. The emitted SQL for the subqueryload does not
    correctly join the child and parent JTI tables, resulting in a cross
    product.

    Info:

      * psql (PostgreSQL) 9.5.7
      * SQLAlchemy 0.9.10

    Suppose we have the following relations (see full repro insructions
    below):

      * Dept
      * Owner
          o dept_id
      * Milestone
          o owner_id
      * Sprint (subclass of Milestone)
          o id references milestone.id <http://milestone.id>


    When doing the following query:

    
Sprint.query.filter(Sprint.id.in_([1])).options(db.subqueryload(Sprint.owner).subqueryload(Owner.dept)).all()

    We see the three following queries, the third of which has an
    incorrect inner SELECT:

    SELECTsprint.id <http://sprint.id>  AS sprint_id,milestone.id 
<http://milestone.id>  AS milestone_id, milestone.milestone_type AS 
milestone_milestone_type, milestone.owner_id AS milestone_owner_id
    FROM milestone JOIN sprint ONmilestone.id <http://milestone.id>  =sprint.id 
<http://sprint.id>
    WHEREsprint.id <http://sprint.id>  IN (%(id_1)s)
    INFO:sqlalchemy.engine.base.Engine:{'id_1': 1}

    SELECTowner.id <http://owner.id>  AS owner_id, owner.dept_id AS 
owner_dept_id, anon_1.milestone_owner_id AS anon_1_milestone_owner_id
    FROM (SELECT DISTINCT milestone.owner_id AS milestone_owner_id
    FROM milestone JOIN sprint ONmilestone.id <http://milestone.id>  =sprint.id 
<http://sprint.id>
    WHEREsprint.id <http://sprint.id>  IN (%(id_1)s)) AS anon_1 JOIN owner ONowner.id 
<http://owner.id>  = anon_1.milestone_owner_id ORDER BY anon_1.milestone_owner_id
    INFO:sqlalchemy.engine.base.Engine:{'id_1': 1}

    SELECTdept.id <http://dept.id>  AS dept_id, owner_1.dept_id AS 
owner_1_dept_id
    FROM (SELECT DISTINCT milestone.owner_id AS milestone_owner_id
    FROM milestone, sprint
    WHEREsprint.id <http://sprint.id>  IN (%(id_1)s)) AS anon_1 JOIN owner AS owner_1 
ONowner_1.id <http://owner_1.id>  = anon_1.milestone_owner_id JOIN dept ONdept.id 
<http://dept.id>  = owner_1.dept_id ORDER BY owner_1.dept_id
    INFO:sqlalchemy.engine.base.Engine:{'id_1': 1}

    In particular, the inner SELECT on the third query, |SELECT DISTINCT
    milestone.owner_id AS milestone_owner_id FROM milestone, sprint
    WHERE sprint.id <http://sprint.id> IN (%(id_1)s)| (corresponding to
    |subqueryload(Owner.dept)|) is selecting from tables milestone and
    sprint without specifying a join condition, which results in the a
    cross product of the two tables (the entire milestone table is
    returned along with sprint.id <http://sprint.id> = 1).

    The temporary work around we found was to replace
    |.filter(Sprint.id.in_([1]))| to |.filter(Milestone.id.in_([1]))| ,
    which then emits this query instead:

    SELECTdept.id <http://dept.id>  AS dept_id, owner_1.dept_id AS 
owner_1_dept_id
    FROM (SELECT DISTINCT milestone.owner_id AS milestone_owner_id
    FROM milestone
    WHEREmilestone.id <http://milestone.id>  IN (%(id_1)s)) AS anon_1 JOIN owner AS owner_1 
ONowner_1.id <http://owner_1.id>  = anon_1.milestone_owner_id JOIN dept ONdept.id 
<http://dept.id>  = owner_1.dept_id ORDER BY owner_1.dept_id

    Which avoids taking the cross product of milestone and sprint.
    (Note, this only works since sprint.id <http://sprint.id> is the
    foreign key to milestone.id <http://milestone.id>, so filtering by
    milestone.id <http://milestone.id> is sufficient)


        Repro

    # Create some tables:
    CREATE TABLE dept (id integer PRIMARY KEY);
    CREATE TABLE owner (id integer PRIMARY KEY, dept_id integer, CONSTRAINT 
owner_dept_id_dept_id_fkey FOREIGN KEY (dept_id) REFERENCES dept (id));
    CREATE TABLE milestone (id integer PRIMARY KEY, milestone_type VARCHAR(64) 
NOT NULL, owner_id integer, CONSTRAINT milestone_owner_id_owner_id_fkey FOREIGN 
KEY (owner_id) REFERENCES dept (id));
    CREATE TABLE sprint (id integer PRIMARY KEY, CONSTRAINT 
sprint_id_milestone_id_fkey FOREIGN KEY (id) REFERENCES milestone (id));

    class Milestone(db.Model):
         __tablename__ = 'milestone'
         id = db.Column(db.Integer, primary_key=True)

         milestone_type = db.Column(db.String(64), nullable=False)
         owner_id = db.Column(
             db.Integer,
             db.ForeignKey('owner.id <http://owner.id>', 
name='milestone_owner_id_owner_id_fkey'))

         owner = db.relationship('Owner', back_populates='milestone')

         __mapper_args__ = {
             'polymorphic_on': milestone_type,
         }


    class Sprint(Milestone):
         __tablename__ = 'sprint'
         id = db.Column(db.Integer,
                        db.ForeignKey('milestone.id <http://milestone.id>', 
name='sprint_id_milestone_id_fkey'),
                        primary_key=True)

         __mapper_args__ = {
             'polymorphic_identity': 'sprint'
         }


    class Owner(db.Model):
         __tablename__ = 'owner'
         id = db.Column(db.Integer, primary_key=True)

         dept_id = db.Column(db.Integer,
                             db.ForeignKey('dept.id <http://dept.id>', 
name='owner_dept_id_dept_id_fkey'))
         dept = db.relationship('Dept', back_populates='owner')
         milestone = db.relationship('Milestone', back_populates='owner')


    class Dept(db.Model):
         __tablename__ = 'dept'
         id = db.Column(db.Integer, primary_key=True)
         owner = db.relationship('Owner', back_populates='dept')


    Then insert some data:

    d = Dept(); o = Owner(); o.dept = d;  s = Sprint(); s.owner = o;
    s.id <http://s.id>  = 1;o.id <http://o.id>  = 1;d.id <http://d.id>  = 1;
    db.session.add(o)
    db.session.commit()


    Then attempt this query:

    
Sprint.query.filter(Sprint.id.in_([1])).options(db.subqueryload(Sprint.owner).subqueryload(Owner.dept)).all()


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, andVerifiable Example. See http://stackoverflow.com/help/mcve for a fulldescription.

---

You received this message because you are subscribed to the GoogleGroups "sqlalchemy" group.To unsubscribe from this group and stop receiving emails from it, sendan email to sqlalchemy+unsubscr...@googlegroups.com<mailto:sqlalchemy+unsubscr...@googlegroups.com>.To post to this group, send email to sqlalchemy@googlegroups.com<mailto:sqlalchemy@googlegroups.com>.

Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--

SQLAlchemy -The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.

---You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: SQLAlchemy JTI subqueryload bug

Reply via email to