Thanks for your explanation. I think the proposal is reasonable.

On Thu, Dec 12, 2019 at 3:32 AM Yangze Guo <karma...@gmail.com> wrote:

> Thanks for the feedback, Gary.
>
> Regarding the WordCount test:
> - True. There is no test coverage increment compared to others.
> However, I think each test case better not have multiple purposes so
> that we could find out the root cause quickly. As discussed in
> FLINK-15135[1], I prefer only including WordCount test as the first
> step. If the time overhead of E2E tests become severe in the future, I
> agree to remove it. WDYT?
> - I think the main overhead comes from building the image. The
> subsequent tests will run fast since they will not build it again.
>
> Regarding the Rocks test, I think it is a typical scenario using
> off-heap memory. The main purpose is to verify the memory usage and
> memory configuration in Mesos mode. Two typical use cases are off-heap
> and on-heap. Thus, I think the following two test cases are valuable
> to be included:
> - A streaming task using heap backend. It should explicitly set the
> “taskmanager.memory.managed.size” to zero to check the potential
> unexpected usage of off-heap memory.
> - A streaming task using rocks backend. It covers the scenario using
> off-heap memory.
>
> Look forward to your kind feedback.
>
> [1]https://issues.apache.org/jira/browse/FLINK-15135
>
> Best,
> Yangze Guo
>
>
>
> On Wed, Dec 11, 2019 at 6:14 PM Gary Yao <g...@ververica.com> wrote:
> >
> > Thanks for driving this effort. Also +1 from my side. I have left a few
> > questions below.
> >
> > > - Wordcount end-to-end test. For verifying the basic process of Mesos
> > > deployment.
> >
> > Would this add additional test coverage compared to the
> > "multiple submissions" test case? I am asking because the E2E tests are
> > already
> > expensive to run, and adding new tests should be carefully considered.
> >
> > > - State TTL RocksDb backend end-to-end test. For verifying memory
> > > configuration behaviors, since Mesos has it’s own config options and
> > > logics.
> >
> > Can you elaborate more on this? Which config options are relevant here?
> >
> > On Wed, Dec 11, 2019 at 9:58 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
> >
> > > +1 for building the image locally. If need should arise, then we could
> > > change it always later.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Dec 11, 2019 at 4:05 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > >
> > > > Thanks, Yangtze.
> > > >
> > > > +1 for building the image locally.
> > > > The time consumption for both building image locally and pulling it
> from
> > > > DockerHub sounds reasonable and affordable. Therefore, I'm also in
> favor
> > > of
> > > > avoiding the cost maintaining a custom image.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <karma...@gmail.com>
> wrote:
> > > >
> > > > > Thanks for the feedback, Yang.
> > > > >
> > > > > Some updates I want to share in this thread.
> > > > > I have built a PoC version of Meos e2e test with WordCount
> > > > > workflow.[1] Then, I ran it in the testing environment. As the
> result
> > > > > shown here[2]:
> > > > > - For pulling image from DockerHub, it took 1 minute and 21 seconds
> > > > > - For building it locally, it took 2 minutes and 54 seconds.
> > > > >
> > > > > I prefer building it locally. Although it is slower, I think the
> time
> > > > > overhead, comparing to the cost of maintaining the image in
> DockerHub
> > > > > and the whole test process, is trivial for building or pulling the
> > > > > image.
> > > > >
> > > > > I look forward to hearing from you. ;)
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> > > > > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <danrtsey...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Thanks Yangze for starting this discussion.
> > > > > >
> > > > > > Just share my thoughts.
> > > > > >
> > > > > > If the mesos official docker image could not meet our
> requirement, i
> > > > > suggest to build the image locally.
> > > > > > We have done the same things for yarn e2e tests. This way is more
> > > > > flexible and easy to maintain. However,
> > > > > > i have no idea how long building the mesos image locally will
> take.
> > > > > Based on previous experience of yarn, i
> > > > > > think it may not take too much time.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Yangze Guo <karma...@gmail.com> 于2019年12月7日周六 下午4:25写道:
> > > > > >>
> > > > > >> Thanks for your feedback!
> > > > > >>
> > > > > >> @Till
> > > > > >> Regarding the time overhead, I think it mainly come from the
> network
> > > > > >> transmission. For building the image locally, it will totally
> > > download
> > > > > >> 260MB files including the base image and packages. For pulling
> from
> > > > > >> DockerHub, the compressed size of the image is 347MB. Thus, I
> agree
> > > > > >> that it is ok to build the image locally.
> > > > > >>
> > > > > >> @Piyush
> > > > > >> Thank you for offering the help and sharing your usage
> scenario. In
> > > > > >> current stage, I think it will be really helpful if you can
> compress
> > > > > >> the custom image[1] or reduce the time overhead to build it
> locally.
> > > > > >> Any ideas for improving test coverage will also be appreciated.
> > > > > >>
> > > > > >> [1]
> > > > >
> > > >
> > >
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> > > > > >>
> > > > > >> Best,
> > > > > >> Yangze Guo
> > > > > >>
> > > > > >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <
> p.nar...@criteo.com>
> > > > > wrote:
> > > > > >> >
> > > > > >> > +1 from our end as well. At Criteo, we are running some Flink
> jobs
> > > > on
> > > > > Mesos in production to compute short term features for machine
> > > learning.
> > > > > We’d love to help out and contribute on this initiative.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > -- Piyush
> > > > > >> >
> > > > > >> >
> > > > > >> > From: Till Rohrmann <trohrm...@apache.org>
> > > > > >> > Date: Friday, December 6, 2019 at 8:10 AM
> > > > > >> > To: dev <dev@flink.apache.org>
> > > > > >> > Cc: user <u...@flink.apache.org>
> > > > > >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos
> > > > integration
> > > > > >> >
> > > > > >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> > > > > integration. Ideally we would have it ready for the 1.10 release.
> The
> > > > lack
> > > > > of such a test has bitten us already multiple times.
> > > > > >> >
> > > > > >> > In general I would prefer to use the official image if
> possible
> > > > since
> > > > > it frees us from maintaining our own custom image. Since Java 9 is
> no
> > > > > longer officially supported as we opted for supporting Java 11
> (LTS) it
> > > > > might not be feasible, though. How much longer would building the
> > > custom
> > > > > image vs. downloading the custom image from DockerHub be? Maybe it
> is
> > > ok
> > > > to
> > > > > build the image locally. Then we would not have to maintain the
> image.
> > > > > >> >
> > > > > >> > Cheers,
> > > > > >> > Till
> > > > > >> >
> > > > > >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <
> karma...@gmail.com
> > > > > <mailto:karma...@gmail.com>> wrote:
> > > > > >> > Hi, all,
> > > > > >> >
> > > > > >> > Currently, there is no end to end test or IT case for Mesos
> > > > deployment
> > > > > >> > while the common deployment related developing would
> inevitably
> > > > touch
> > > > > >> > the logic of this component. Thus, some work needs to be done
> to
> > > > > >> > guarantee experience for both Meos users and contributors.
> After
> > > > > >> > offline discussion with Till and Xintong, we have some basic
> ideas
> > > > and
> > > > > >> > would like to start a discussion thread on adding end to end
> tests
> > > > for
> > > > > >> > Flink's Mesos integration.
> > > > > >> >
> > > > > >> > As a first step, we would like to keep the scope of this
> > > > contribution
> > > > > >> > to be relative small. This may also help us to quickly get
> some
> > > > basic
> > > > > >> > test cases that might be helpful for the upcoming 1.10
> release.
> > > > > >> >
> > > > > >> > As far as we can think of, what needs to be done is to setup a
> > > Mesos
> > > > > >> > framework during the testing and determine which tests need
> to be
> > > > > >> > included.
> > > > > >> >
> > > > > >> >
> > > > > >> > ** Regarding the Mesos framework, after trying out several
> > > > approaches,
> > > > > >> > I find that setting up Mesos in docker is probably what we
> want.
> > > The
> > > > > >> > resources needed for building and setting up Mesos from
> source is
> > > > > >> > probably not affordable in most of the scenarios. So, the one
> open
> > > > > >> > question that worth discussion is the choice of Docker image.
> We
> > > > have
> > > > > >> > come up with two options.
> > > > > >> >
> > > > > >> > - Using official Mesos image[1]
> > > > > >> > The official image was the first alternative that come to our
> > > mind,
> > > > > >> > but we run into some sort of Java version compatibility
> problem
> > > that
> > > > > >> > leads to failures of launching task executors. Flink supports
> > > Java 9
> > > > > >> > since version 1.9.0 [2], However, the official Docker image of
> > > Mesos
> > > > > >> > is built with a development version of JDK 9, which probably
> has
> > > > > >> > caused this problem. Unless we want to make Flink to also be
> > > > > >> > compatible with the JDK development version used by the
> official
> > > > mesos
> > > > > >> > image, this option does not work out. Besides, according to
> the
> > > > > >> > official roadmap[5], Java 9 is not a long-term support
> version,
> > > > which
> > > > > >> > may bring stability risk in future.
> > > > > >> >
> > > > > >> > - Build a custom image
> > > > > >> > I've already tried build a custom image[3] and successfully
> run
> > > most
> > > > > >> > of the existing end to end tests cases with it. The image is
> built
> > > > > >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e
> test
> > > > > >> > framework, we could either build the image from a Docker file
> or
> > > > pull
> > > > > >> > the pre-built image from DockerHub (or other hub services)
> during
> > > > the
> > > > > >> > testing.
> > > > > >> > If we decide to publish the an image on DockerHub, we probably
> > > need
> > > > a
> > > > > >> > Flink official  repository/account to hold it.
> > > > > >> >
> > > > > >> >
> > > > > >> > ** Regarding the test coverage, we think the following three
> tests
> > > > > >> > could be a good starting point that covers a very essential
> set of
> > > > > >> > behaviors for Mesos deployment.
> > > > > >> > - Wordcount end-to-end test. For verifying the basic process
> of
> > > > Mesos
> > > > > >> > deployment.
> > > > > >> > - Multiple submissions of the same job. For preventing
> resource
> > > > > >> > management problems on Mesos, such as [4]
> > > > > >> > - State TTL RocksDb backend end-to-end test. For verifying
> memory
> > > > > >> > configuration behaviors, since Mesos has it’s own config
> options
> > > and
> > > > > >> > logics.
> > > > > >> >
> > > > > >> > Unfortunately, neither of us who participated the initial
> offline
> > > > > >> > discussion has much experience for running flink on mesos in
> > > > > >> > production. It would be good that users and experts who
> actually
> > > use
> > > > > >> > flink on mesos can join the discussion and provide some
> feedbacks.
> > > > Any
> > > > > >> > feedback, idea, suggestion, concern and question will be
> welcomed
> > > > and
> > > > > >> > appreciated.
> > > > > >> >
> > > > > >> >
> > > > > >> > BTW, we would like to raise a survey on the usages of Flink on
> > > Mesos
> > > > > >> > in the community. For the Flink on Mesos users, we would like
> to
> > > > > >> > learn:
> > > > > >> > - Which version of Mesos do you use and what setups (such as
> > > > Marathon)
> > > > > >> > do you need for Mesos
> > > > > >> > - Is it Flink job cluster or session cluster that  is majorly
> used
> > > > > >> > - How is the scale of the Flink / Mesos cluster
> > > > > >> >
> > > > > >> >
> > > > > >> > [1]https://hub.docker.com/r/mesosphere/mesos
> > > > > >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > > > > >> > [3]
> https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > > > > >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > > > > >> > [5]
> > > > >
> https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> > > > > >> >
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Yangze Guo
> > > > >
> > > >
> > >
>

Reply via email to