kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#issuecomment-528697431 Some more testing: For another trial, I am completely removing the need to load JSON from a str by using JSON columns instead of str columns. Just did some benchmarks on my local machine and it is very impressive. Not having to loads json from str and vice-versa seemed to have halved the time needed to de-serialized dags. Basically, when we use JSON columns, we no longer require to convert json to str when writing it to DB as Sqlalchemy does that for us, it can write a python dict as json object directly. And similarly it can do the other way round i.e reading from DB it directly reads json & create a Py dictionary. For 100 Dags, Parsing from file: 19.6 s (14.6 s - Best run after 5 runs) Dag Serialisation with `json.loads`: 26.5 s (17.8 s - Best Run after 5 runs) Dag Serialisation with `ujson`: 25.8 s (17.3 s - Best Run after 5 runs) Dag Serialisation with *Json Columns* (removed converting str to json & vice-versa): 12.1 s (6.98 s ± 169 ms - Best Run after 5 runs) Need to however tests this results on our staging cluster too as it can be very different. Will do it tomorrow. has been a long day fighting with json libraries - ~5AM here :sleeping: Postgres Jsonb might be even quicker ! Will try that out to tomorrow
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
