kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-528697431
 
 
   Some more testing:
   
   For another trial, I am completely removing the need to load JSON from a str 
by using JSON columns instead of str columns.
   
   Just did some benchmarks on my local machine and it is very impressive. Not 
having to loads json from str and vice-versa seemed to have halved the time 
needed to de-serialized dags.
   
   Basically, when we use JSON columns, we no longer require to convert json to 
str when writing it to DB as Sqlalchemy does that for us, it can write a python 
dict as json object directly. And similarly it can do the other way round i.e 
reading from DB it directly reads json & create a Py dictionary.
   
   For 100 Dags,
   
   Parsing from file: 19.6 s (14.6 s - Best run after 5 runs)
   Dag Serialisation with `json.loads`: 26.5 s (17.8 s - Best Run after 5 runs)
   Dag Serialisation with `ujson`: 25.8 s (17.3 s - Best Run after 5 runs)
   Dag Serialisation with *Json Columns* (removed converting str to json & 
vice-versa): 12.1 s (6.98 s ± 169 ms - Best Run after 5 runs)
   
   Need to however tests this results on our staging cluster too as it can be 
very different. Will do it tomorrow. has been a long day fighting with json 
libraries - ~5AM here :sleeping: 
   
   Postgres Jsonb might be even quicker ! Will try that out to tomorrow

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to