Surprised to hear that it doesn't work with celery. Is that right? I assumed that this was the main target.
If it's really only a benefit in dag processor, it's surprising that it provides much benefit because it should be one call per var-file-parse; in worker it will be once per ti and I assumed this would be where the heavy calls come from. Maybe I miss something.