+1 to all 3 considering we are trying to centralise the code. 2 should be redone eventually as part of https://issues.apache.org/jira/browse/APEXCORE-796 ? But the design for this needs to be seen in the broader context of some of the points mentioned below:
Regarding 3, I agree that the current image is tightly coupled to bigtop. While making it independent of bigtop is a starting step, I believe we might need to revisit our thinking around as to how we would like to implement containerisation for Apex in the first place. There are multiple design items to be resolved for Apex containerisation: 1. Apex community needs to evaluate both Hadoop based and Hadoop free architectures. For non-hadoop based architectures, we need to solve DFS alternatives as well as the resource manager alternatives. Tickets like https://issues.apache.org/jira/browse/APEXCORE-724 will bring this design issue in more detail I believe. 2. Consider how Apex applications will be built as part of the build process that results in a docker image of the Apex application ( That would contain application code , malhar operators etc) 3. Consider how we would like to make use of Hadoop 3 support for Docker https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html Just curious about the docker implementation: Is the end goal of the docker image to provide a sandbox for 1. Evaluating Apex or 2. Make Apex installable binary as an image or 3. Make Apex applications aligned with a docker build process ( Ex: Python libraries installed on the image as part of the application code )? The reason I raise these questions is that it does not make much sense to bundle a cluster in a box with any distribution ( dockerizing a Hadoop cluster is non-trivial and I have not heard good success stories around this approach so far that can be enabled for production). The docker image that embeds a Hadoop binary is thus only useful for evaluation wherein everything is contained in the same image and nothing more. My suspicion is that we will anyways would revisit this approach if our goals are 2 and/or 3 as well. Perhaps we will address these questions as part of https://issues.apache.org/jira/browse/APEXCORE-724 and https://issues.apache.org/jira/browse/APEXCORE-796. Regards, Ananth On Fri, May 4, 2018 at 10:31 AM, Vlad Rozov <vro...@apache.org> wrote: > +1 to all 3. > > Thank you, > > Vlad > > > On 5/3/18 07:03, Thomas Weise wrote: > >> +1 to all of this >> >> There are existing JIRAs that you can assign / add to: >> >> https://issues.apache.org/jira/browse/APEXCORE-727 >> >> Thanks! >> >> >> >> On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <chin...@apache.org> >> wrote: >> >> Hello Community, >>> >>> I want to propose following improvements for apex-core build and related >>> steps: >>> >>> 1. Most (probably all of the open source project) has the a binary >>> release >>> package of the software and not just the source release package. >>> Currently >>> we have only source package. Luckily there are few places (outside of >>> apache apex) where binary packages of apex has been created for different >>> purposes : https://github.com/atrato/apex-cli-package & >>> https://github.com/apache/bigtop) >>> >>> Proposal here is generate this binary release package as a part of build >>> process of apex-core. >>> >>> >>> 2. Currently, the docker build that is being created for apex is built >>> one >>> of my personal repository (https://github.com/chinmaykol >>> hatkar/docker-pool >>> ). >>> While I don't mind hosting the content (Dockerfile etc...) in my >>> repository, I believe it make sense to host this in apex-core repository. >>> This way, there is a possibility of using docker github triggers for >>> building the docker image from release branches. >>> >>> >>> 3. Currently the docker build uses hadoop and apex specific packages from >>> bigtop deb repo & CI. (See >>> https://github.com/chinmaykolhatkar/docker-pool/ >>> blob/master/apex/ubuntu/app/setup.sh >>> for more details) >>> While use of hadoop packages from bigtop repo is fine, we also need to >>> rely >>> on bigtop contribution to update apex component and then build from >>> bigtop >>> CI for getting apex.deb package. Basically our docker image generation >>> process gets blocked on bigtop source update to generate the updated apex >>> deb. >>> As we technically don't need to depend on bigtop to generate the apex >>> binary, the proposal here is to generate binary package during build >>> process (point 1) and use that during docker image build process instead >>> of >>> using the ready made deb package from bigtop CI. >>> >>> >>> I understand that there are multiple items being mention in a single mail >>> but they seem related hence the mail. >>> >>> Please let me know your opinion on above items. >>> >>> Thanks, >>> Chinmay. >>> >>> >