We at Lyft are starting to think about developing pipelines using Jupyter, and was wondering if there were any notes from this meeting, or follow up discussions that might be relevant. We're interested in prior art and seeing what's already been done *Max Payton* He/Him/His Software Engineer 202.441.7757 <+12024417757> [image: Lyft] <http://www.lyft.com/>
On Wed, Nov 4, 2020 at 9:29 PM QP Hou <[email protected]> wrote: > On Tue, Nov 3, 2020 at 12:05 PM Alan K Chin <[email protected]> wrote: > > @Jarek - It sounds like git-sync is or rather should be the default way > users add/modify DAGs. With that said, have you had any experience with > customers syncing their dags to other forms of dag storage (S3 etc.) and > what the outcomes were? > > > > We have been using S3 for DAG sync in production for more than a year > now. The biggest benefit is we basically never have to worry about > scalability or availability compared to other solutions. Access > control can be managed through IAM roles, which is entirely > transparent to application code. On top of that, the S3 event delivery > feature can be leveraged to avoid the pulling loop to make sync almost > real time. Only downside is you need to set up a CI/CD pipeline to > publish DAG changes to S3. I wrote about our implementation in our > tech blog at > https://tech.scribd.com/blog/2020/breaking-up-the-dag-repo.html. >
