Thanks Yangze for starting this discussion. Just share my thoughts.
If the mesos official docker image could not meet our requirement, i suggest to build the image locally. We have done the same things for yarn e2e tests. This way is more flexible and easy to maintain. However, i have no idea how long building the mesos image locally will take. Based on previous experience of yarn, i think it may not take too much time. Best, Yang Yangze Guo <karma...@gmail.com> 于2019年12月7日周六 下午4:25写道: > Thanks for your feedback! > > @Till > Regarding the time overhead, I think it mainly come from the network > transmission. For building the image locally, it will totally download > 260MB files including the base image and packages. For pulling from > DockerHub, the compressed size of the image is 347MB. Thus, I agree > that it is ok to build the image locally. > > @Piyush > Thank you for offering the help and sharing your usage scenario. In > current stage, I think it will be really helpful if you can compress > the custom image[1] or reduce the time overhead to build it locally. > Any ideas for improving test coverage will also be appreciated. > > [1] > https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64 > > Best, > Yangze Guo > > On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <p.nar...@criteo.com> wrote: > > > > +1 from our end as well. At Criteo, we are running some Flink jobs on > Mesos in production to compute short term features for machine learning. > We’d love to help out and contribute on this initiative. > > > > Thanks, > > -- Piyush > > > > > > From: Till Rohrmann <trohrm...@apache.org> > > Date: Friday, December 6, 2019 at 8:10 AM > > To: dev <dev@flink.apache.org> > > Cc: user <u...@flink.apache.org> > > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration > > > > Big +1 for adding a fully working e2e test for Flink's Mesos > integration. Ideally we would have it ready for the 1.10 release. The lack > of such a test has bitten us already multiple times. > > > > In general I would prefer to use the official image if possible since it > frees us from maintaining our own custom image. Since Java 9 is no longer > officially supported as we opted for supporting Java 11 (LTS) it might not > be feasible, though. How much longer would building the custom image vs. > downloading the custom image from DockerHub be? Maybe it is ok to build the > image locally. Then we would not have to maintain the image. > > > > Cheers, > > Till > > > > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <karma...@gmail.com<mailto: > karma...@gmail.com>> wrote: > > Hi, all, > > > > Currently, there is no end to end test or IT case for Mesos deployment > > while the common deployment related developing would inevitably touch > > the logic of this component. Thus, some work needs to be done to > > guarantee experience for both Meos users and contributors. After > > offline discussion with Till and Xintong, we have some basic ideas and > > would like to start a discussion thread on adding end to end tests for > > Flink's Mesos integration. > > > > As a first step, we would like to keep the scope of this contribution > > to be relative small. This may also help us to quickly get some basic > > test cases that might be helpful for the upcoming 1.10 release. > > > > As far as we can think of, what needs to be done is to setup a Mesos > > framework during the testing and determine which tests need to be > > included. > > > > > > ** Regarding the Mesos framework, after trying out several approaches, > > I find that setting up Mesos in docker is probably what we want. The > > resources needed for building and setting up Mesos from source is > > probably not affordable in most of the scenarios. So, the one open > > question that worth discussion is the choice of Docker image. We have > > come up with two options. > > > > - Using official Mesos image[1] > > The official image was the first alternative that come to our mind, > > but we run into some sort of Java version compatibility problem that > > leads to failures of launching task executors. Flink supports Java 9 > > since version 1.9.0 [2], However, the official Docker image of Mesos > > is built with a development version of JDK 9, which probably has > > caused this problem. Unless we want to make Flink to also be > > compatible with the JDK development version used by the official mesos > > image, this option does not work out. Besides, according to the > > official roadmap[5], Java 9 is not a long-term support version, which > > may bring stability risk in future. > > > > - Build a custom image > > I've already tried build a custom image[3] and successfully run most > > of the existing end to end tests cases with it. The image is built > > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test > > framework, we could either build the image from a Docker file or pull > > the pre-built image from DockerHub (or other hub services) during the > > testing. > > If we decide to publish the an image on DockerHub, we probably need a > > Flink official repository/account to hold it. > > > > > > ** Regarding the test coverage, we think the following three tests > > could be a good starting point that covers a very essential set of > > behaviors for Mesos deployment. > > - Wordcount end-to-end test. For verifying the basic process of Mesos > > deployment. > > - Multiple submissions of the same job. For preventing resource > > management problems on Mesos, such as [4] > > - State TTL RocksDb backend end-to-end test. For verifying memory > > configuration behaviors, since Mesos has it’s own config options and > > logics. > > > > Unfortunately, neither of us who participated the initial offline > > discussion has much experience for running flink on mesos in > > production. It would be good that users and experts who actually use > > flink on mesos can join the discussion and provide some feedbacks. Any > > feedback, idea, suggestion, concern and question will be welcomed and > > appreciated. > > > > > > BTW, we would like to raise a survey on the usages of Flink on Mesos > > in the community. For the Flink on Mesos users, we would like to > > learn: > > - Which version of Mesos do you use and what setups (such as Marathon) > > do you need for Mesos > > - Is it Flink job cluster or session cluster that is majorly used > > - How is the scale of the Flink / Mesos cluster > > > > > > [1]https://hub.docker.com/r/mesosphere/mesos > > [2]https://issues.apache.org/jira/browse/FLINK-11307 > > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink > > [4]https://issues.apache.org/jira/browse/FLINK-14074 > > [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html > > > > > > Best, > > Yangze Guo >