dvchristianbors opened a new issue, #19567: URL: https://github.com/apache/superset/issues/19567
When increasing the number of keys in `IN` clauses, the runtime of the query is sgnificantly increased. This means that queries with large numbers of keys are running for a very long time (several seconds for 200 keys), even though a direct query on the underlying database will terminate within milliseconds. See discussion in Slack: https://apache-superset.slack.com/archives/C014LS99C1K/p1633448327074000 #### How to reproduce the bug Precondition: Load an example dataset 1. Go to 'Data' and open "Explore" _your example dataset_ (e.g., birth_names from the example data) 2. Click on 'Query Mode' > 'Raw Records' and add at least one column 3. In Filters, click the _Plus_-button to add a new Filter - Custom SQL 4. Enter a short `IN` query with several keys, e.g. `name IN ("Liam", "James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin") 5. Enter a long `IN` query with at hundrets of keys, e.g., `name IN ("Liam","James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin", "William", "Logan", "Mason", "Jack", "John", "Asher", "Elijah", "Daniel", "Henry", "Jacob", "Jaxon", "Michael", "Oliver", "Hunter", "David", "Levi", "Matthew", "Landon", "Aiden", "Isaac", "Jackson", "Caleb", "Ryan", "Elias", "Connor", "Evan", "Joshua", "Samuel", "Christian", "Jayden", "Jeremiah", "Cooper", "Eli", "Robert", "Ryder", "Christopher", "Colton", "Josiah", "Andrew", "Austin", "Carson", "Jaxson", "Jonathan", "Luke", "Malachi", "Nathan", "Owen", "Blake", "Lincoln", "Ezra", "Gavin", "Thomas", "Dylan", "Grayson", "Kai", "Ryker", "Zachary", "Anthony", "Isaiah", "Jase", "Jason", "Micah", "Sebastian", "Silas", "Titus", "Bentley", "Brody", "Cameron", "Carter", "Chase", "Gideon", "Jace", "Sawyer", "Tristan", "Tyler", "Weston", "Adam", "Charles", "Everett", "Wesley", "Xander", "Brandon", "Brayden", "Nathaniel", "Theod ore", "Xavier", "Ashton", "Avery", "Dominic", "Easton", "Finn", "George", "Hudson", "Ian", "Jasper", "Kayden", "Marshall", "Max", "Maxwell", "Miles", "Orion", "Richard", "Timothy", "Abel", "Drake", "Garrett", "Jameson", "Jayce", "Joel", "Kenneth", "Maximus", "Nicholas", "Parker", "Travis", "Cody", "Dean", "Declan", "Elliot", "Ezekiel", "Karter", "Nolan", "Patrick", "Riley", "Seth", "Solomon", "Steven", "Victor", "Waylon", "Aaron", "August", "Bradley", "Braxton", "Bryce", "Calvin", "Camden", "Cayden", "Charlie", "Cole", "Damian", "Dawson", "Eric", "Greyson", "Jake", "Jeffrey", "Jesse", "Jonah", "Julian", "Kaiden", "Killian", "Kingston", "Maddox", "Matthias", "Maverick", "Odin", "Paul", "Peter", "Roman", "Trevor", "Zane", "Alex", "Archer", "Caden", "Collin", "Colt", "Edward", "Gage", "Gunner", "Harrison", "Ivan", "Jax", "Leo", "Lukas", "Marcus", "Paxton", "Soren", "Sullivan", "Tanner", "Trenton", "Troy", "Tucker", "Vincent", "Walter", "Warren", "Adrian", "Augustus", "Axel", "Beckett", "Cade", "Clayton", "Dante")` (see a list of baby names [here](https://www.kaggle.com/datasets/kaggle/us-baby-names?resource=download&select=StateNames.csv) 6. Compare the computation times. Even though an in clause would return the data almost instantly ### Expected results Computation time is roughly similar, within a few milliseconds. ### Actual results In my local setup, the difference is: Short `IN` query: 0.34 sec Long `IN` query: 1.62 sec Even longer `IN` query: 6.07 sec Upon adding more clauses, the runtime increases in quadratic time. #### Screenshots Short `IN` query (20 keys):  Long `IN` query (200 keys):  Even longer `IN` query (500 keys):  ### Environment - browser type and version: Tested on Chrome: Version 100.0.4896.60 (Official Build) (64-bit) - superset version: master branch commit 03d3eaacafc6ebdad7fdbcef6efa4df553468ba1 - python version: 3.8.10 - node.js version: v12.22.9 - any feature flags active: None ### Checklist Make sure to follow these steps before submitting your issue - thank you! ### Additional context The source of this quadratic runtime of the query is caused by both the `sqlparse.parse` and `sqlparse.format` functions called in numerous places (`/models/core.py`, `db_engine_specs/base.py`, `connectors/sqla/models.py`, and `common/query_object.py`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
