Hey everyone, With Amaterasu-8 in place and Amaterasu-45 nearing completion, there is one more critical thing we need to attend to before we could call the new architecture complete.
Dependencies resolution, management, distribution. A feedback I received in PyCon-IL last year was about not using the standard Python dependencies resolution mechanisms. So for example, we currently require to supply a "python.yml" file instead of a "requirements.txt" file. This could also be applied to the Scala counterpart. Instead of defining the dependencies in the "jars.yml", we could leverage one or more of the available build tools to manage this (maybe by supplying a pom.xml as part of the job repository, somehow, or a build.gradle file) In regard to Python specifically, going forward, we will definitely let the user provide a requirements.txt file. Dependencies distribution is also a problem I would like to think about. Option 1: Leader will merge job level and action level dependencies into one dependencies definition file (e.g. requirements.txt) and will distribute that file to the executors. Pros: Easy to implement. Cons: Will not work in networks without outgoing internet connection as the executors will need access to an external package repository. Option 2: Leader will do a full dependencies resolution and distributed the already downloaded and packaged dependencies to the executors, where they will be installed. Pros: Will work in any environment, only requires the leader to have outgoing internet connection. Cons: 1. Much more difficult to implement. Requires implementation of wrappers to language specific package management systems. 2. Introduces more state to the system that has to be managed. Another "weak link" in my opinion. Anyhow, let's discuss this, I would like to hear suggestions, thoughts, etc. Cheers, Nadav