+1 to all 3 considering we are trying to centralise the code.

2 should be redone eventually as part of
https://issues.apache.org/jira/browse/APEXCORE-796 ? But the design for
this needs to be seen in the broader context of some of the points
mentioned below:

Regarding 3, I agree that the current image is tightly coupled to bigtop.
While making it independent of bigtop is a starting step, I believe we
might need to revisit our thinking around as to how we would like to
implement containerisation for Apex in the first place.


There are multiple design items to be resolved for Apex containerisation:

1. Apex community needs to evaluate both Hadoop based and Hadoop free
architectures. For non-hadoop based architectures, we need to solve DFS
alternatives as well as the resource manager alternatives. Tickets like
https://issues.apache.org/jira/browse/APEXCORE-724 will bring this design
issue in more detail I believe.

2. Consider how Apex applications will be built as part of the build
process that results in a docker image of the Apex application ( That would
contain application code , malhar operators etc)

3. Consider how we would like to make use of Hadoop 3 support for Docker
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html




Just curious about the docker implementation: Is the end goal of the docker
image to provide a sandbox for

1. Evaluating Apex or
2. Make Apex installable binary as an image or
3. Make Apex applications aligned with a docker build process ( Ex: Python
libraries installed on the image as part of the application code )?

The reason I raise these questions is that it does not make much sense to
bundle a cluster in a box with any distribution ( dockerizing a Hadoop
cluster is non-trivial and I have not heard good success stories around
this approach so far that can be enabled for production). The docker image
that embeds a Hadoop binary is thus only useful for evaluation wherein
everything is contained in the same image and nothing more.

My suspicion is that we will anyways would revisit this approach if our
goals are 2 and/or 3 as well. Perhaps we will address these questions as
part of https://issues.apache.org/jira/browse/APEXCORE-724 and
https://issues.apache.org/jira/browse/APEXCORE-796.

Regards,
Ananth

On Fri, May 4, 2018 at 10:31 AM, Vlad Rozov <vro...@apache.org> wrote:

> +1 to all 3.
>
> Thank you,
>
> Vlad
>
>
> On 5/3/18 07:03, Thomas Weise wrote:
>
>> +1 to all of this
>>
>> There are existing JIRAs that you can assign / add to:
>>
>> https://issues.apache.org/jira/browse/APEXCORE-727
>>
>> Thanks!
>>
>>
>>
>> On Thu, May 3, 2018 at 4:26 AM, Chinmay Kolhatkar <chin...@apache.org>
>> wrote:
>>
>> Hello Community,
>>>
>>> I want to propose following improvements for apex-core build and related
>>> steps:
>>>
>>> 1. Most (probably all of the open source project) has the a binary
>>> release
>>> package of the software and not just the source release package.
>>> Currently
>>> we have only source package. Luckily there are few places (outside of
>>> apache apex) where binary packages of apex has been created for different
>>> purposes : https://github.com/atrato/apex-cli-package &
>>> https://github.com/apache/bigtop)
>>>
>>> Proposal here is generate this binary release package as a part of build
>>> process of apex-core.
>>>
>>>
>>> 2. Currently, the docker build that is being created for apex is built
>>> one
>>> of my personal repository (https://github.com/chinmaykol
>>> hatkar/docker-pool
>>> ).
>>> While I don't mind hosting the content (Dockerfile etc...) in my
>>> repository, I believe it make sense to host this in apex-core repository.
>>> This way, there is a possibility of using docker github triggers for
>>> building the docker image from release branches.
>>>
>>>
>>> 3. Currently the docker build uses hadoop and apex specific packages from
>>> bigtop deb repo & CI. (See
>>> https://github.com/chinmaykolhatkar/docker-pool/
>>> blob/master/apex/ubuntu/app/setup.sh
>>> for more details)
>>> While use of hadoop packages from bigtop repo is fine, we also need to
>>> rely
>>> on bigtop contribution to update apex component and then build from
>>> bigtop
>>> CI for getting apex.deb package. Basically our docker image generation
>>> process gets blocked on bigtop source update to generate the updated apex
>>> deb.
>>> As we technically don't need to depend on bigtop to generate the apex
>>> binary, the proposal here is to generate binary package during build
>>> process (point 1) and use that during docker image build process instead
>>> of
>>> using the ready made deb package from bigtop CI.
>>>
>>>
>>> I understand that there are multiple items being mention in a single mail
>>> but they seem related hence the mail.
>>>
>>> Please let me know your opinion on above items.
>>>
>>> Thanks,
>>> Chinmay.
>>>
>>>
>

Reply via email to