ashb opened a new pull request #20722:
URL: https://github.com/apache/airflow/pull/20722


   This uses the "bulk" operation API of SQLAlchemy to get a big speed
   up. Due to the `task_instance_mutation_hook` we still need to keep
   actual TaskInstance objects around.
   
   For postgresql we have enabled to "batch operation helpers"[1] which
   makes it even faster. The default page sizes are chosen somewhat
   randomly based on the SQLA docs.
   
   To make these options configurable I have added (and used here and in
   KubeConfig) a new `getjson` option to AirflowConfigParser class.
   
   *Postgresql is over 40% faster*:
   
   Before:
   
   ```
   number_of_tis=1 mean=0.004397215199423954 per=0.004397215199423954 
times=[0.009390181003254838, 0.002814065999700688, 0.00284132499655243, 
0.0036120269942330196, 0.0033284770033787936]
   number_of_tis=10 mean=0.008078816600027494 per=0.0008078816600027494 
times=[0.011014281000825576, 0.008476420000079088, 0.00741832799394615, 
0.006857775995740667, 0.006627278009545989]
   number_of_tis=50 mean=0.01927847799670417 per=0.00038556955993408336 
times=[0.02556803499464877, 0.01935569499619305, 0.01662322599440813, 
0.01840184700267855, 0.01644358699559234]
   number_of_tis=100 mean=0.03301511880126782 per=0.00033015118801267817 
times=[0.04117956099798903, 0.030890661000739783, 0.03007458901265636, 
0.03125198099587578, 0.03167880199907813]
   number_of_tis=500 mean=0.15320950179593637 per=0.0003064190035918727 
times=[0.20054609200451523, 0.14052859699586406, 0.14509809199080337, 
0.1365471329918364, 0.1433275949966628]
   number_of_tis=1000 mean=0.2929377429973101 per=0.0002929377429973101 
times=[0.3517978919990128, 0.2807794280088274, 0.2806490379880415, 
0.27710555399244186, 0.27435680299822707]
   number_of_tis=3000 mean=0.9935687056015012 per=0.00033118956853383374 
times=[1.2047388390055858, 0.8248025969951414, 0.8685875020019012, 
0.9017027500085533, 1.1680118399963249]
   number_of_tis=5000 mean=1.5349355740036117 per=0.00030698711480072236 
times=[1.8663743910001358, 1.5182018500054255, 1.5446484510030132, 
1.3932801040064078, 1.3521730740030762]
   number_of_tis=10000 mean=3.7448632712010292 per=0.0003744863271201029 
times=[4.135914924001554, 3.4411147559876554, 3.526543836007477, 
3.7195197630062466, 3.9012230770022143]
   number_of_tis=15000 mean=6.3099766838044165 per=0.00042066511225362775 
times=[6.552250057997298, 6.1369703890086384, 6.8749958210100885, 
6.067943914007628, 5.917723236998427]
   number_of_tis=20000 mean=8.317583500797628 per=0.00041587917503988143 
times=[8.720249108009739, 8.0188543760014, 8.328030352990027, 
8.398350054994808, 8.122433611992165]
   ```
   
   After:
   
   ```
   number_of_tis=1 mean=0.026246879794052803 per=0.026246879794052803 
times=[0.031441625993466005, 0.025166517996694893, 0.02518146399233956, 
0.024703859991859645, 0.02474093099590391]
   number_of_tis=10 mean=0.02652196400158573 per=0.002652196400158573 
times=[0.027266821009106934, 0.026017504002084024, 0.02769480799906887, 
0.025840838003205135, 0.025789848994463682]
   number_of_tis=50 mean=0.032463929001824 per=0.00064927858003648 
times=[0.03659850900294259, 0.03128377899702173, 0.03133225999772549, 
0.030985830002464354, 0.032119267008965835]
   number_of_tis=100 mean=0.03862043260014616 per=0.0003862043260014616 
times=[0.04082123900298029, 0.03752484500000719, 0.037281844997778535, 
0.03927708099945448, 0.0381971530005103]
   number_of_tis=500 mean=0.10123570079740603 per=0.00020247140159481206 
times=[0.11780315199575853, 0.09932849500910379, 0.10016329499194399, 
0.09410478499194141, 0.09477877699828241]
   number_of_tis=1000 mean=0.17536458960094023 per=0.00017536458960094024 
times=[0.20034298300743103, 0.17775658299797215, 0.17178491500089876, 
0.16488367799320258, 0.16205478900519665]
   number_of_tis=3000 mean=0.5013463032053551 per=0.00016711543440178504 
times=[0.6868100110004889, 0.46566563300439157, 0.44849480800621677, 
0.4379984680126654, 0.46776259600301273]
   number_of_tis=5000 mean=0.840471555799013 per=0.0001680943111598026 
times=[1.0285392189980485, 0.8854761679976946, 0.7579579270095564, 
0.730956947998493, 0.7994275169912726]
   number_of_tis=10000 mean=1.975292908004485 per=0.0001975292908004485 
times=[1.9648507620004239, 1.8537165410089074, 1.8826112380047562, 
1.9254138420074014, 2.2498721570009366]
   number_of_tis=15000 mean=3.4746556333935588 per=0.00023164370889290392 
times=[4.0400224499899196, 3.1751998239924433, 3.6206128539924975, 
3.6852884859981714, 2.8521545529947616]
   number_of_tis=20000 mean=4.678154367001843 per=0.00023390771835009216 
times=[4.465847548010061, 4.571855771995615, 4.749505186002352, 
4.724330568002188, 4.8792327609990025]
   ```
   
   MySQL is only 10-15% faster (and a lot noisier)
   
   Before:
   
   ```
   number_of_tis=1 mean=0.006164804595755413 per=0.006164804595755413 
times=[0.013516580002033152, 0.00427598599344492, 0.004508020996581763, 
0.004067091998877004, 0.004456343987840228]
   number_of_tis=10 mean=0.007822793803643435 per=0.0007822793803643434 
times=[0.0081135170039488, 0.00719467100861948, 0.009007985994685441, 
0.00758794900320936, 0.007209846007754095]
   number_of_tis=50 mean=0.020377356800599954 per=0.00040754713601199905 
times=[0.02612382399092894, 0.018950315003166907, 0.019109474000288174, 
0.018008680999628268, 0.019694490008987486]
   number_of_tis=100 mean=0.040682651600218375 per=0.00040682651600218374 
times=[0.05449078499805182, 0.037430580996442586, 0.039291110006161034, 
0.03625023599306587, 0.035950546007370576]
   number_of_tis=500 mean=0.18646696420037187 per=0.00037293392840074375 
times=[0.24278165798750706, 0.17090376401029062, 0.1837275660072919, 
0.16893767600413412, 0.1659841569926357]
   number_of_tis=1000 mean=0.5903461098030676 per=0.0005903461098030675 
times=[0.6001852740009781, 0.5642872750031529, 0.686630773008801, 
0.5578094649972627, 0.5428177620051429]
   number_of_tis=3000 mean=1.9076304554007948 per=0.0006358768184669316 
times=[2.042052763994434, 2.1137778090051142, 1.7461599689995637, 
1.7260139089921722, 1.9101478260126896]
   number_of_tis=5000 mean=2.9185905692051164 per=0.0005837181138410233 
times=[2.9221124830073677, 3.2889883980096783, 2.7569778940087417, 
2.973596281008213, 2.651277789991582]
   number_of_tis=10000 mean=8.880191986600403 per=0.0008880191986600403 
times=[7.3548113360011484, 9.13715232499817, 9.568511486999341, 
8.80206210000324, 9.538422685000114]
   number_of_tis=15000 mean=15.426499317999696 per=0.0010284332878666464 
times=[14.944712879005237, 15.38737604500784, 15.409629273999599, 
15.852925243991194, 15.53785314799461]
   number_of_tis=20000 mean=20.579332908798825 per=0.0010289666454399414 
times=[20.362008597003296, 19.878823954990366, 20.73281196100288, 
20.837948996995692, 21.085071034001885]
   ```
   
   After:
   
   ```
   number_of_tis=1 mean=0.04114753239555284 per=0.04114753239555284 
times=[0.05534043599618599, 0.03716265498951543, 0.039479082988691516, 
0.03779561800183728, 0.035959870001534]
   number_of_tis=10 mean=0.038440523599274454 per=0.003844052359927445 
times=[0.03949839199776761, 0.03853203100152314, 0.03801383898826316, 
0.03784418400027789, 0.03831417200854048]
   number_of_tis=50 mean=0.05345874359773006 per=0.0010691748719546012 
times=[0.07045628099876922, 0.04431965999538079, 0.06068256100115832, 
0.04566028399858624, 0.04617493199475575]
   number_of_tis=100 mean=0.06805712619971019 per=0.0006805712619971019 
times=[0.07946423999965191, 0.06054415399557911, 0.06277450300694909, 
0.07836744099040516, 0.05913529300596565]
   number_of_tis=500 mean=0.17929348759935237 per=0.00035858697519870476 
times=[0.2792787920043338, 0.16563376400154084, 0.14093860499269795, 
0.1464673139998922, 0.16414896299829707]
   number_of_tis=1000 mean=0.3883620931970654 per=0.00038836209319706536 
times=[0.47511668599327095, 0.3506359229941154, 0.43458069299231283, 
0.33563552900159266, 0.3458416350040352]
   number_of_tis=3000 mean=1.3977356655988842 per=0.0004659118885329614 
times=[1.575020256001153, 1.3353702509921277, 1.4193720350012882, 
1.4037733709992608, 1.2551424150005914]
   number_of_tis=5000 mean=2.3742491033975965 per=0.0004748498206795193 
times=[2.4926851909986, 2.501419166001142, 2.2862377730052685, 
2.4421103859931463, 2.1487930009898264]
   number_of_tis=10000 mean=8.138347979800892 per=0.0008138347979800893 
times=[6.648954969001352, 8.001181932995678, 8.551437315007206, 
9.084980526997242, 8.405185155002982]
   number_of_tis=15000 mean=14.065810968197184 per=0.0009377207312131455 
times=[13.222158194999793, 14.375066226988565, 14.108006285998272, 
14.157014351992984, 14.466809781006305]
   number_of_tis=20000 mean=18.36637533060275 per=0.0009183187665301375 
times=[17.728908119010157, 18.62269214099797, 18.936747477011522, 
17.74613195299753, 18.797396962996572]
   ```
   
   [1]: 
https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-batch-mode
   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to