As we used flink bot to trigger the CI test, could we add a command for flink bot to retrigger the CI(sometimes we may encounter some flaky tests)
Best, Congxian Chesnay Schepler <ches...@apache.org> 于2019年7月8日周一 上午5:01写道: > The vote has passed unanimously in favor of migrating to a separate > Travis account. > > I will now set things up such that no PullRequest is no longer run on > the ASF servers. > This is a major setup in reducing our usage of ASF resources. > For the time being we'll use free Travis plan for flink-ci (i.e. 5 > workers, which is the same the ASF gives us). Over the course of the > next week we'll setup the Ververica subscription to increase this limit. > > From now now, a bot will mirror all new and updated PullRequests to a > mirror repository (https://github.com/flink-ci/flink-ci) and write an > update into the PR once the build is complete. > I have ran the bots for the past 3 days in parallel to our existing > Travis and it was working without major issues. > > The biggest change that contributors will see is that there's no longer > a icon next to each commit. We may revisit this in the future. > > I'll setup a repo with the source of the bot later. > > On 04/07/2019 10:46, Chesnay Schepler wrote: > > I've raised a JIRA > > <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to > > inquire whether it would be possible to switch to a different Travis > > account, and if so what steps would need to be taken. > > We need a proper confirmation from INFRA since we are not in full > > control of the flink repository (for example, we cannot access the > > settings page). > > > > If this is indeed possible, Ververica is willing sponsor a Travis > > account for the Flink project. > > This would provide us with more than enough resources than we need. > > > > Since this makes the project more reliant on resources provided by > > external companies I would like to vote on this. > > > > Please vote on this proposal, as follows: > > [ ] +1, Approve the migration to a Ververica-sponsored Travis account, > > provided that INFRA approves > > [ ] -1, Do not approach the migration to a Ververica-sponsored Travis > > account > > > > The vote will be open for at least 24h, and until we have confirmation > > from INFRA. The voting period may be shorter than the usual 3 days > > since our current is effectively not working. > > > > On 04/07/2019 06:51, Bowen Li wrote: > >> Re: > Are they using their own Travis CI pool, or did the switch to > >> an entirely different CI service? > >> > >> I reached out to Wes and Krisztián from Apache Arrow PMC. They are > >> currently moving away from ASF's Travis to their own in-house metal > >> machines at [1] with custom CI application at [2]. They've seen > >> significant improvement w.r.t both much higher performance and > >> basically no resource waiting time, "night-and-day" difference > >> quoting Wes. > >> > >> Re: > If we can just switch to our own Travis pool, just for our > >> project, then this might be something we can do fairly quickly? > >> > >> I believe so, according to [3] and [4] > >> > >> > >> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/> > >> [2] https://github.com/ursa-labs/ursabot > >> [3] > >> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >> [4] > https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com > >> > >> > >> > >> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler <ches...@apache.org > >> <mailto:ches...@apache.org>> wrote: > >> > >> Are they using their own Travis CI pool, or did the switch to an > >> entirely different CI service? > >> > >> If we can just switch to our own Travis pool, just for our > >> project, then > >> this might be something we can do fairly quickly? > >> > >> On 03/07/2019 05:55, Bowen Li wrote: > >> > I responded in the INFRA ticket [1] that I believe they are > >> using a wrong > >> > metric against Flink and the total build time is a completely > >> different > >> > thing than guaranteed build capacity. > >> > > >> > My response: > >> > > >> > "As mentioned above, since I started to pay attention to Flink's > >> build > >> > queue a few tens of days ago, I'm in Seattle and I saw no build > >> was kicking > >> > off in PST daytime in weekdays for Flink. Our teammates in China > >> and Europe > >> > have also reported similar observations. So we need to evaluate > >> how the > >> > large total build time came from - if 1) your number and 2) our > >> > observations from three locations that cover pretty much a full > >> day, are > >> > all true, I **guess** one reason can be that - highly likely the > >> extra > >> > build time came from weekends when other Apache projects may be > >> idle and > >> > Flink just drains hard its congested queue. > >> > > >> > Please be aware of that we're not complaining about the lack of > >> resources > >> > in general, I'm complaining about the lack of **stable, > >> dedicated** > >> > resources. An example for the latter one is, currently even if > >> no build is > >> > in Flink's queue and I submit a request to be the queue head in > >> PST > >> > morning, my build won't even start in 6-8+h. That is an absurd > >> amount of > >> > waiting time. > >> > > >> > That's saying, if ASF INFRA decides to adopt a quota system and > >> grants > >> > Flink five DEDICATED servers that runs all the time only for > >> Flink, that'll > >> > be PERFECT and can totally solve our problem now. > >> > > >> > Please be aware of that we're not complaining about the lack of > >> resources > >> > in general, I'm complaining about the lack of **stable, > >> dedicated** > >> > resources. An example for the latter one is, currently even if > >> no build is > >> > in Flink's queue and I submit a request to be the queue head in > >> PST > >> > morning, my build won't even start in 6-8+h. That is an absurd > >> amount of > >> > waiting time. > >> > > >> > > >> > That's saying, if ASF INFRA decides to adopt a quota system and > >> grants > >> > Flink five DEDICATED servers that runs all the time only for > >> Flink, that'll > >> > be PERFECT and can totally solve our problem now. > >> > > >> > I feel what's missing in the ASF INFRA's Travis resource pool is > >> some level > >> > of build capacity SLAs and certainty" > >> > > >> > > >> > Again, I believe there are differences in nature of these two > >> problems, > >> > long build time v.s. lack of dedicated build resource. That's > >> saying, > >> > shortening build time may relieve the situation, and may not. > >> I'm sightly > >> > negative on disabling IT cases for PRs, due to the downside is > >> that we are > >> > at risk of any potential bugs in PR that UTs doesn't catch, and > >> may cost a > >> > lot more to fix and if it slows others down or even block > >> others, but am > >> > open to others opinions on it. > >> > > >> > AFAICT from INFRA ticket[1], donating to ASF INFRA won't be > >> feasible to > >> > solve our problem since INFRA's pool is fully shared and they > >> have no > >> > control and finer insights over resource allocation to a > >> specific Apache > >> > project. As mentioned in [1], Apache Arrow is moving away from > >> ASF INFRA > >> > Travis pool (they are actually surprised Flink hasn't plan to do > >> so). I > >> > know that Spark is on its own build infra. If we all agree that > >> funding our > >> > own build infra, I'd be glad to help investigate any potential > >> options > >> > after releasing 1.9 since I'm super busy with 1.9 now. > >> > > >> > [1] https://issues.apache.org/jira/browse/INFRA-18533 > >> > > >> > > >> > > >> > On Tue, Jul 2, 2019 at 4:46 AM Chesnay Schepler > >> <ches...@apache.org <mailto:ches...@apache.org>> wrote: > >> > > >> >> As a short-term stopgap, since we can assume this issue to > >> become much > >> >> worse in the following days/weeks, we could disable IT cases in > >> PRs and > >> >> only run them on master. > >> >> > >> >> On 02/07/2019 12:03, Chesnay Schepler wrote: > >> >>> People really have to stop thinking that just because > >> something works > >> >>> for us it is also a good solution. > >> >>> Also, please remember that our builds run for 2h from start to > >> finish, > >> >>> and not the 14 _minutes_ it takes for zeppelin. > >> >>> We are dealing with an entirely different scale here, both in > >> terms of > >> >>> build times and number of builds. > >> >>> > >> >>> In this very thread people have been complaining about long > >> queue > >> >>> times for their builds. Surprise, other Apache projects have > >> been > >> >>> suffering the very same thing due to us not controlling our > >> build > >> >>> times. While switching services (be it Jenkins, CircleCI or > >> whatever) > >> >>> will possibly work for us (and these options are actually > >> attractive, > >> >>> like CircleCI's proper support for build artifacts), it will > >> also > >> >>> result in us likely negatively affecting other projects in > >> significant > >> >>> ways. > >> >>> > >> >>> Sure, the Jenkins setup has a good user experience for us, at > >> the cost > >> >>> of blocking Jenkins workers for a _lot_ of time. Right now we > >> have 25 > >> >>> PR's in our queue; that's possibly 50h we'd consume of Jenkins > >> >>> resources, and the European contributors haven't even really > >> started yet. > >> >>> > >> >>> FYI, the latest INFRA response from INFRA-18533: > >> >>> > >> >>> "Our rough metrics shows that Flink used over 5800 hours of > >> build time > >> >>> last month. That is equal to EIGHT servers running 24/7 for > >> the ENTIRE > >> >>> MONTH. EIGHT. nonstop. > >> >>> When we discovered this last night, we discussed it some and > >> are going > >> >>> to tune down Flink to allow only five executors maximum. We > >> cannot > >> >>> allow Flink to consume so much of a Foundation shared resource." > >> >>> > >> >>> So yes, we either > >> >>> a) have to heavily reduce our CI usage or > >> >>> b) fund our own, either maintaining it ourselves or donating > >> to Apache. > >> >>> > >> >>> On 02/07/2019 05:11, Bowen Li wrote: > >> >>>> By looking at the git history of the Jenkins script, its core > >> part > >> >>>> was finished in March 2017 (and only two minor update in > >> 2017/2018), > >> >>>> so it's been running for over two years now and feels like > >> Zepplin > >> >>>> community has been quite happy with it. @Jeff Zhang > >> >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> can you > >> share your insights and user > >> >>>> experience with the Jenkins+Travis approach? > >> >>>> > >> >>>> Things like: > >> >>>> > >> >>>> - has the approach completely solved the resource capacity > >> problem > >> >>>> for Zepplin community? is Zepplin community happy with the > >> result? > >> >>>> - is the whole configuration chain stable (e.g. uptime) enough? > >> >>>> - how often do you need to maintain the Jenkins infra? how many > >> >>>> people are usually involved in maintenance and bug-fixes? > >> >>>> > >> >>>> The downside of this approach seems mostly to be on the > >> maintenance > >> >>>> to me - maintain the script and Jenkins infra. > >> >>>> > >> >>>> ** Having Our Own Travis-CI.com Account ** > >> >>>> > >> >>>> Another alternative I've been thinking of is to have our own > >> >>>> travis-ci.com <http://travis-ci.com> <http://travis-ci.com> > >> account with paid dedicated > >> >>>> resources. Note travis-ci.org <http://travis-ci.org> > >> <http://travis-ci.org> is the free > >> >>>> version and travis-ci.com <http://travis-ci.com> > >> <http://travis-ci.com> is the commercial > >> >>>> version. We currently use a shared resource pool managed by > >> ASK INFRA > >> >>>> team on travis-ci.org <http://travis-ci.org> > >> <http://travis-ci.org>, but we have no control > >> >>>> over it - we can't see how it's configured, how much > >> resources are > >> >>>> available, how resources are allocated among Apache projects, > >> etc. > >> >>>> The nice thing about having an account on travis-ci.com > >> <http://travis-ci.com> > >> >>>> <http://travis-ci.com> are: > >> >>>> > >> >>>> - relatively low cost with much better resource guarantee > >> than what > >> >>>> we currently have [1]: $249/month with 5 dedicated concurrency, > >> >>>> $489/month with 10 concurrency > >> >>>> - low maintenance work compared to using Jenkins > >> >>>> - (potentially) no migration cost according to Travis's doc [2] > >> >>>> (pending verification) > >> >>>> - full control over the build capacity/configuration > >> compared to > >> >>>> using ASF INFRA's pool > >> >>>> > >> >>>> I'd be surprised if we as such a vibrant community cannot > >> find and > >> >>>> fund $249*12=$2988 a year in exchange for a much better > >> developer > >> >>>> experience and much higher productivity. > >> >>>> > >> >>>> [1] https://travis-ci.com/plans > >> >>>> [2] > >> >>>> > >> >> > >> > https://docs.travis-ci.com/user/migrate/open-source-repository-migration > >> >>>> On Sat, Jun 29, 2019 at 8:39 AM Chesnay Schepler > >> <ches...@apache.org <mailto:ches...@apache.org> > >> >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > wrote: > >> >>>> > >> >>>> So yes, the Jenkins job keeps pulling the state from > >> Travis until it > >> >>>> finishes. > >> >>>> > >> >>>> Note sure I'm comfortable with the idea of using Jenkins > >> workers > >> >>>> just to > >> >>>> idle for a several hours. > >> >>>> > >> >>>> On 29/06/2019 14:56, Jeff Zhang wrote: > >> >>>> > Here's what zeppelin community did, we make a python > >> script to > >> >>>> check the > >> >>>> > build status of pull request. > >> >>>> > Here's script: > >> >>>> > > >> https://github.com/apache/zeppelin/blob/master/travis_check.py > >> >>>> > > >> >>>> > And this is the script we used in Jenkins build job. > >> >>>> > > >> >>>> > if [ -f "travis_check.py" ]; then > >> >>>> > git log -n 1 > >> >>>> > STATUS=$(curl -s $BUILD_URL | grep -e "GitHub pull > >> >>>> request.*from.*" | sed > >> >>>> > 's/.*GitHub pull request <a > >> >>>> > href=\"\(https[^"]*\).*from[^"]*.\(https[^"]*\).*/\1 > >> \2/g') > >> >>>> > AUTHOR=$(echo $STATUS | sed 's/.*[/]\(.*\)$/\1/g') > >> >>>> > PR=$(echo $STATUS | awk '{print $1}' | sed > >> >>>> 's/.*[/]\(.*\)$/\1/g') > >> >>>> > #COMMIT=$(git log -n 1 | grep "^Merge:" | awk > >> '{print $3}') > >> >>>> > #if [ -z $COMMIT ]; then > >> >>>> > # COMMIT=$(curl -s > >> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >> >>>> > | grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | > >> tr '\n' ' ' > >> >>>> | sed > >> >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >> grep -v > >> >>>> "apache:" | > >> >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >> >>>> > #fi > >> >>>> > > >> >>>> > # get commit hash from PR > >> >>>> > COMMIT=$(curl -s > >> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR | > >> >>>> > grep -e "\"label\":" -e "\"ref\":" -e "\"sha\":" | tr > >> '\n' ' ' > >> >>>> | sed > >> >>>> > 's/\(.*sha[^,]*,\)\(.*ref.*\)/\1 = \2/g' | tr = '\n' | > >> grep -v > >> >>>> "apache:" | > >> >>>> > sed 's/.*sha.[^"]*["]\([^"]*\).*/\1/g') > >> >>>> > sleep 30 # sleep few moment to wait travis starts > >> the build > >> >>>> > RET_CODE=0 > >> >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > >> RET_CODE=$? > >> >>>> > if [ $RET_CODE -eq 2 ]; then # try with repository > >> name when > >> >>>> travis-ci is > >> >>>> > not available in the account > >> >>>> > RET_CODE=0 > >> >>>> > AUTHOR=$(curl -s > >> >>>> https://api.github.com/repos/apache/zeppelin/pulls/$PR > >> >>>> > | grep '"full_name":' | grep -v "apache/zeppelin" | sed > >> >>>> > 's/.*[:][^"]*["]\([^/]*\).*/\1/g') > >> >>>> > python ./travis_check.py ${AUTHOR} ${COMMIT} || > >> RET_CODE=$? > >> >>>> > fi > >> >>>> > > >> >>>> > if [ $RET_CODE -eq 2 ]; then # fail with can't find > >> build > >> >>>> information in > >> >>>> > the travis > >> >>>> > set +x > >> >>>> > echo > >> "-----------------------------------------------------" > >> >>>> > echo "Looks like travis-ci is not configured for > >> your fork." > >> >>>> > echo "Please setup by swich on 'zeppelin' > >> repository at > >> >>>> > https://travis-ci.org/profile and travis-ci." > >> >>>> > echo "And then make sure 'Build branch updates' > >> option is > >> >>>> enabled in > >> >>>> > the settings > >> https://travis-ci.org/${AUTHOR}/zeppelin/settings > >> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings> > >> >>>> <https://travis-ci.org/$%7BAUTHOR%7D/zeppelin/settings>." > >> >>>> > echo "" > >> >>>> > echo "To trigger CI after setup, you will need > >> ammend your > >> >>>> last commit > >> >>>> > with" > >> >>>> > echo "git commit --amend" > >> >>>> > echo "git push your-remote HEAD --force" > >> >>>> > echo "" > >> >>>> > echo "See > >> >>>> > > >> >>>> > >> >> > >> > http://zeppelin.apache.org/contribution/contributions.html#continuous-integration > >> >>>> > ." > >> >>>> > fi > >> >>>> > > >> >>>> > exit $RET_CODE > >> >>>> > else > >> >>>> > set +x > >> >>>> > echo "travis_check.py does not exists" > >> >>>> > exit 1 > >> >>>> > fi > >> >>>> > > >> >>>> > Chesnay Schepler <ches...@apache.org > >> <mailto:ches...@apache.org> > >> >>>> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >> 于2019年6月29日周六 下午3:17写道: > >> >>>> > > >> >>>> >> Does this imply that a Jenkins job is active as long > >> as the > >> >>>> Travis build > >> >>>> >> runs? > >> >>>> >> > >> >>>> >> On 26/06/2019 21:28, Bowen Li wrote: > >> >>>> >>> Hi, > >> >>>> >>> > >> >>>> >>> @Dawid, I think the "long test running" as I > >> mentioned in the > >> >>>> first > >> >>>> >> email, > >> >>>> >>> also as you guys said, belongs to "a big effort > >> which is much > >> >>>> harder to > >> >>>> >>> accomplish in a short period of time and may deserve > >> its own > >> >>>> separate > >> >>>> >>> discussion". Thus I didn't include it in what we can > >> do in a > >> >>>> foreseeable > >> >>>> >>> short term. > >> >>>> >>> > >> >>>> >>> Besides, I don't think that's the ultimate reason > >> for lack of > >> >>>> build > >> >>>> >>> resources. Even if the build is shortened to > >> something like > >> >>>> 2h, the > >> >>>> >>> problems of no build machine works about 6 or more > >> hours in > >> >>>> PST daytime > >> >>>> >>> that I described will still happen, because no > >> machine from > >> >>>> ASF INFRA's > >> >>>> >>> pool is allocated to Flink. As I have paid close > >> attention to > >> >>>> the build > >> >>>> >>> queue in the past few weekdays, it's a pretty clear > >> pattern now. > >> >>>> >>> > >> >>>> >>> **The ultimate root cause** for that is - we don't > >> have any > >> >>>> **dedicated** > >> >>>> >>> build resources that we can stably rely on. I'm > >> actually ok to > >> >>>> wait for a > >> >>>> >>> long time if there are build requests running, it > >> means at > >> >>>> least we are > >> >>>> >>> making progress. But I'm not ok with no build > >> resource. A > >> >>>> better place I > >> >>>> >>> think we should aim at in short term is to always > >> have at > >> >>>> least a central > >> >>>> >>> pool (can be 3 or 5) of machines dedicated to build > >> Flink at > >> >>>> any time, or > >> >>>> >>> maybe use users resources. > >> >>>> >>> > >> >>>> >>> @Chesnay @Robert I synced with Jeff offline that > >> Zeppelin > >> >>>> community is > >> >>>> >>> using a Jenkins job to automatically build on users' > >> travis > >> >>>> account and > >> >>>> >>> link the result back to github PR. I guess the > >> Jenkins job > >> >>>> would fetch > >> >>>> >>> latest upstream master and build the PR against it. > >> Jeff has > >> >>>> filed > >> >>>> >> tickets > >> >>>> >>> to learn and get access to the Jenkins infra. It'll > >> better to > >> >>>> fully > >> >>>> >>> understand it first before judging this approach. > >> >>>> >>> > >> >>>> >>> I also heard good things about CircleCI, and ASF > >> INFRA seems > >> >>>> to have a > >> >>>> >> pool > >> >>>> >>> of build capacity there too. Can be an alternative > >> to consider. > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> On Wed, Jun 26, 2019 at 12:44 AM Dawid Wysakowicz < > >> >>>> >> dwysakow...@apache.org > >> <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org > >> <mailto:dwysakow...@apache.org>>> > >> >>>> >>> wrote: > >> >>>> >>> > >> >>>> >>>> Sorry to jump in late, but I think Bowen missed the > >> most > >> >>>> important point > >> >>>> >>>> from Chesnay's previous message in the summary. The > >> ultimate > >> >>>> reason for > >> >>>> >>>> all the problems is that the tests take close to 2 > >> hours to > >> >>>> run already. > >> >>>> >>>> I fully support this claim: "Unless people start > >> caring about > >> >>>> test times > >> >>>> >>>> before adding them, this issue cannot be solved" > >> >>>> >>>> > >> >>>> >>>> This is also another reason why using user's Travis > >> account > >> >>>> won't help. > >> >>>> >>>> Every few weeks we reach the user's time limit for > >> a single > >> >>>> profile. > >> >>>> >>>> This makes the user's builds simply fail, until we > >> either > >> >>>> properly > >> >>>> >>>> decrease the time the tests take (which I am not > >> sure we ever > >> >>>> did) or > >> >>>> >>>> postpone the problem by splitting into more > >> profiles. (Note > >> >>>> that the ASF > >> >>>> >>>> Travis account has higher time limits) > >> >>>> >>>> > >> >>>> >>>> Best, > >> >>>> >>>> > >> >>>> >>>> Dawid > >> >>>> >>>> > >> >>>> >>>> On 26/06/2019 09:36, Robert Metzger wrote: > >> >>>> >>>>> Do we know if using "the best" available hardware > >> would > >> >>>> improve the > >> >>>> >> build > >> >>>> >>>>> times? > >> >>>> >>>>> Imagine we would run the build on machines with > >> plenty of > >> >>>> main memory > >> >>>> >> to > >> >>>> >>>>> mount everything to ramdisk + the latest CPU > >> architecture? > >> >>>> >>>>> > >> >>>> >>>>> Throwing hardware at the problem could help reduce > >> the time > >> >>>> of an > >> >>>> >>>>> individual build, and using our own infrastructure > >> would > >> >>>> remove our > >> >>>> >>>>> dependency on Apache's Travis account (with the > >> obvious > >> >>>> downside of > >> >>>> >>>> having > >> >>>> >>>>> to maintain the infrastructure) > >> >>>> >>>>> We could use an open source travis alternative, to > >> have a > >> >>>> similar > >> >>>> >>>>> experience and make the migration easy. > >> >>>> >>>>> > >> >>>> >>>>> > >> >>>> >>>>> On Wed, Jun 26, 2019 at 9:34 AM Chesnay Schepler > >> >>>> <ches...@apache.org <mailto:ches...@apache.org> > >> <mailto:ches...@apache.org <mailto:ches...@apache.org>>> > >> >>>> >>>> wrote: > >> >>>> >>>>>> >From what I gathered, there's no special > >> sauce that the > >> >>>> Zeppelin > >> >>>> >>>>>> project uses which actually integrates a users > >> Travis > >> >>>> account into the > >> >>>> >>>> PR. > >> >>>> >>>>>> They just disabled Travis for PRs. And that's > >> kind of it. > >> >>>> >>>>>> > >> >>>> >>>>>> Naturally we can do this (duh) and safe the ASF a > >> fair > >> >>>> amount of > >> >>>> >>>>>> resources, but there are downsides: > >> >>>> >>>>>> > >> >>>> >>>>>> The discoverability of the Travis check takes a > >> nose-dive. > >> >>>> Either we > >> >>>> >>>>>> require every contributor to always, an every > >> commit, also > >> >>>> post a > >> >>>> >> Travis > >> >>>> >>>>>> build, or we have the reviewer sift through the > >> >>>> contributors account > >> >>>> >> to > >> >>>> >>>>>> find it. > >> >>>> >>>>>> > >> >>>> >>>>>> This is rather cumbersome. Additionally, it's > >> also not > >> >>>> equivalent to > >> >>>> >>>>>> having a PR build. > >> >>>> >>>>>> > >> >>>> >>>>>> A normal branch build takes a branch as is and > >> tests it. A > >> >>>> PR build > >> >>>> >>>>>> merges the branch into master, and then runs it. > >> (Fun fact: > >> >>>> This is > >> >>>> >> why > >> >>>> >>>>>> a PR without merge conflicts is not being run on > >> Travis.) > >> >>>> >>>>>> > >> >>>> >>>>>> And ultimately, everyone can already make use of > >> this > >> >>>> approach anyway. > >> >>>> >>>>>> > >> >>>> >>>>>> On 25/06/2019 08:02, Jark Wu wrote: > >> >>>> >>>>>>> Hi Jeff, > >> >>>> >>>>>>> > >> >>>> >>>>>>> Thanks for sharing the Zeppelin approach. I > >> think it's a > >> >>>> good idea to > >> >>>> >>>>>>> leverage user's travis account. > >> >>>> >>>>>>> In this way, we can have almost unlimited > >> concurrent build > >> >>>> jobs and > >> >>>> >>>>>>> developers can restart build by themselves > >> (currently only > >> >>>> committers > >> >>>> >>>>>>> can restart PR's build). > >> >>>> >>>>>>> > >> >>>> >>>>>>> But I'm still not very clear how to integrate > >> user's > >> >>>> travis build > >> >>>> >> into > >> >>>> >>>>>>> the Flink pull request's build automatically. > >> Can you > >> >>>> explain more in > >> >>>> >>>>>>> detail? > >> >>>> >>>>>>> > >> >>>> >>>>>>> Another question: does travis only build > >> branches for user > >> >>>> account? > >> >>>> >>>>>>> My concern is that builds for PRs will rebase > >> user's > >> >>>> commits against > >> >>>> >>>>>>> current master branch. > >> >>>> >>>>>>> This will help us to find problems before > >> merge. Builds > >> >>>> for branches > >> >>>> >>>>>>> will lose the impact of new commits in master. > >> >>>> >>>>>>> How does Zeppelin solve this problem? > >> >>>> >>>>>>> > >> >>>> >>>>>>> Thanks again for sharing the idea. > >> >>>> >>>>>>> > >> >>>> >>>>>>> Regards, > >> >>>> >>>>>>> Jark > >> >>>> >>>>>>> > >> >>>> >>>>>>> On Tue, 25 Jun 2019 at 11:01, Jeff Zhang > >> <zjf...@gmail.com <mailto:zjf...@gmail.com> > >> >>>> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>> > >> >>>> >>>>>>> <mailto:zjf...@gmail.com > >> <mailto:zjf...@gmail.com> <mailto:zjf...@gmail.com > >> <mailto:zjf...@gmail.com>>>> wrote: > >> >>>> >>>>>>> > >> >>>> >>>>>>> Hi Folks, > >> >>>> >>>>>>> > >> >>>> >>>>>>> Zeppelin meet this kind of issue before, we solve > >> >>>> it by > >> >>>> >> delegating > >> >>>> >>>>>>> each > >> >>>> >>>>>>> one's PR build to his travis account > >> (Everyone can > >> >>>> have 5 free > >> >>>> >>>>>>> slot for > >> >>>> >>>>>>> travis build). > >> >>>> >>>>>>> Apache account travis build is only triggered > >> when > >> >>>> PR is merged. > >> >>>> >>>>>>> > >> >>>> >>>>>>> > >> >>>> >>>>>>> > >> >>>> >>>>>>> Kurt Young <ykt...@gmail.com > >> <mailto:ykt...@gmail.com> > >> >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com> > >> >>>> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>>>> > >> >>>> >>>>>>> 于2019年6月25日周二 上午10:16写道: > >> >>>> >>>>>>> > >> >>>> >>>>>>> > (Forgot to cc George) > >> >>>> >>>>>>> > > >> >>>> >>>>>>> > Best, > >> >>>> >>>>>>> > Kurt > >> >>>> >>>>>>> > > >> >>>> >>>>>>> > > >> >>>> >>>>>>> > On Tue, Jun 25, 2019 at 10:16 AM Kurt Young > >> >>>> <ykt...@gmail.com <mailto:ykt...@gmail.com> > >> <mailto:ykt...@gmail.com <mailto:ykt...@gmail.com>> > >> >>>> >>>>>>> <mailto:ykt...@gmail.com > >> <mailto:ykt...@gmail.com> <mailto:ykt...@gmail.com > >> <mailto:ykt...@gmail.com>>>> > >> >>>> wrote: > >> >>>> >>>>>>> > > >> >>>> >>>>>>> > > Hi Bowen, > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > > Thanks for bringing this up. We > >> actually have > >> >>>> discussed > >> >>>> >> about > >> >>>> >>>>>>> this, and I > >> >>>> >>>>>>> > > think Till and George have > >> >>>> >>>>>>> > > already spend sometime investigating > >> it. I have > >> >>>> cced both of > >> >>>> >>>>>>> them, and > >> >>>> >>>>>>> > > maybe they can share > >> >>>> >>>>>>> > > their findings. > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > > Best, > >> >>>> >>>>>>> > > Kurt > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > > On Tue, Jun 25, 2019 at 10:08 AM Jark Wu > >> >>>> <imj...@gmail.com <mailto:imj...@gmail.com> > >> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>> > >> >>>> >>>>>>> <mailto:imj...@gmail.com > >> <mailto:imj...@gmail.com> <mailto:imj...@gmail.com > >> <mailto:imj...@gmail.com>>>> > >> >>>> wrote: > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > >> Hi Bowen, > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> Thanks for bringing this. We also > >> suffered from > >> >>>> the long > >> >>>> >>>>>>> build time. > >> >>>> >>>>>>> > >> I agree that we should focus on > >> solving build > >> >>>> capacity > >> >>>> >>>>>>> problem in the > >> >>>> >>>>>>> > >> thread. > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> My observation is there is only one > >> build is > >> >>>> running, all > >> >>>> >> the > >> >>>> >>>>>>> others > >> >>>> >>>>>>> > >> (other > >> >>>> >>>>>>> > >> PRs, master) are pending. > >> >>>> >>>>>>> > >> The pricing plan[1] of travis shows > >> it can > >> >>>> support > >> >>>> >> concurrent > >> >>>> >>>>>>> build > >> >>>> >>>>>>> > jobs. > >> >>>> >>>>>>> > >> But I don't know which plan we are > >> using, might > >> >>>> be the free > >> >>>> >>>>>>> plan for > >> >>>> >>>>>>> > open > >> >>>> >>>>>>> > >> source. > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> I cc-ed Chesnay who may have some > >> experience on > >> >>>> Travis. > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> Regards, > >> >>>> >>>>>>> > >> Jark > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> [1]: https://travis-ci.com/plans > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> On Tue, 25 Jun 2019 at 08:11, Bowen Li < > >> >>>> >> bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>> > >> >>>> >>>>>>> <mailto:bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com> > >> >>>> <mailto:bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com>>>> wrote: > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > >> > Hi Steven, > >> >>>> >>>>>>> > >> > > >> >>>> >>>>>>> > >> > I think you may not read what I > >> wrote. The > >> >>>> discussion is > >> >>>> >>>> about > >> >>>> >>>>>>> > "unstable > >> >>>> >>>>>>> > >> > build **capacity**", in another word > >> >>>> "unstable / lack of > >> >>>> >>>> build > >> >>>> >>>>>>> > >> resources", > >> >>>> >>>>>>> > >> > not "unstable build". > >> >>>> >>>>>>> > >> > > >> >>>> >>>>>>> > >> > On Mon, Jun 24, 2019 at 4:40 PM > >> Steven Wu > >> >>>> >>>>>>> <stevenz...@gmail.com > >> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >> <mailto:stevenz...@gmail.com>> > >> >>>> <mailto:stevenz...@gmail.com > >> <mailto:stevenz...@gmail.com> <mailto:stevenz...@gmail.com > >> <mailto:stevenz...@gmail.com>>>> > >> >>>> >>>>>>> > wrote: > >> >>>> >>>>>>> > >> > > >> >>>> >>>>>>> > >> > > long and sometimes unstable build is > >> >>>> definitely a pain > >> >>>> >>>>>> point. > >> >>>> >>>>>>> > >> > > > >> >>>> >>>>>>> > >> > > I suspect the build failure here in > >> >>>> >> flink-connector-kafka > >> >>>> >>>>>>> is not > >> >>>> >>>>>>> > >> related > >> >>>> >>>>>>> > >> > to > >> >>>> >>>>>>> > >> > > my change. but there is no easy > >> re-run the > >> >>>> build on > >> >>>> >>>>>>> travis UI. > >> >>>> >>>>>>> > Google > >> >>>> >>>>>>> > >> > > search showed a trick of > >> close-and-open the > >> >>>> PR will > >> >>>> >>>>>>> trigger rebuild. > >> >>>> >>>>>>> > >> but > >> >>>> >>>>>>> > >> > > that could add noises to the PR > >> activities. > >> >>>> >>>>>>> > >> > > > >> >>>> https://travis-ci.org/apache/flink/jobs/545555519 > >> >>>> >>>>>>> > >> > > > >> >>>> >>>>>>> > >> > > travis-ci for my personal repo > >> often failed > >> >>>> with > >> >>>> >>>>>>> exceeding time > >> >>>> >>>>>>> > limit > >> >>>> >>>>>>> > >> > after > >> >>>> >>>>>>> > >> > > 4+ hours. > >> >>>> >>>>>>> > >> > > The job exceeded the maximum time > >> limit for > >> >>>> jobs, and > >> >>>> >> has > >> >>>> >>>>>>> been > >> >>>> >>>>>>> > >> > terminated. > >> >>>> >>>>>>> > >> > > > >> >>>> >>>>>>> > >> > > On Mon, Jun 24, 2019 at 4:15 PM > >> Bowen Li > >> >>>> >>>>>>> <bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com>> > >> >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >> >>>> >>>>>>> > wrote: > >> >>>> >>>>>>> > >> > > > >> >>>> >>>>>>> > >> > > > > >> >>>> https://travis-ci.org/apache/flink/builds/549681530 > >> >>>> >>>>>>> This build > >> >>>> >>>>>>> > >> > request > >> >>>> >>>>>>> > >> > > > has > >> >>>> >>>>>>> > >> > > > been sitting at **HEAD of the > >> queue** > >> >>>> since I first > >> >>>> >> saw > >> >>>> >>>>>>> it at PST > >> >>>> >>>>>>> > >> > 10:30am > >> >>>> >>>>>>> > >> > > > (not sure how long it's been > >> there before > >> >>>> 10:30am). > >> >>>> >>>>>>> It's PST > >> >>>> >>>>>>> > 4:12pm > >> >>>> >>>>>>> > >> now > >> >>>> >>>>>>> > >> > > and > >> >>>> >>>>>>> > >> > > > it hasn't started yet. > >> >>>> >>>>>>> > >> > > > > >> >>>> >>>>>>> > >> > > > On Mon, Jun 24, 2019 at 2:48 PM > >> Bowen Li > >> >>>> >>>>>>> <bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com> <mailto:bowenl...@gmail.com > >> <mailto:bowenl...@gmail.com>> > >> >>>> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com> > >> <mailto:bowenl...@gmail.com <mailto:bowenl...@gmail.com>>>> > >> >>>> >>>>>>> > >> wrote: > >> >>>> >>>>>>> > >> > > > > >> >>>> >>>>>>> > >> > > > > Hi devs, > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > I've been experiencing the pain > >> >>>> resulting from lack > >> >>>> >>>>>>> of stable > >> >>>> >>>>>>> > >> build > >> >>>> >>>>>>> > >> > > > > capacity on Travis for Flink > >> PRs [1]. > >> >>>> >> Specifically, I > >> >>>> >>>>>>> noticed > >> >>>> >>>>>>> > >> often > >> >>>> >>>>>>> > >> > > that > >> >>>> >>>>>>> > >> > > > no > >> >>>> >>>>>>> > >> > > > > build in the queue is making any > >> >>>> progress for > >> >>>> >> hours, > >> >>>> >>>> and > >> >>>> >>>>>>> > suddenly > >> >>>> >>>>>>> > >> 5 > >> >>>> >>>>>>> > >> > or > >> >>>> >>>>>>> > >> > > 6 > >> >>>> >>>>>>> > >> > > > > builds kick off all together > >> after the > >> >>>> long pause. > >> >>>> >>>>>>> I'm at PST > >> >>>> >>>>>>> > >> > (UTC-08) > >> >>>> >>>>>>> > >> > > > time > >> >>>> >>>>>>> > >> > > > > zone, and I've seen pause can > >> be as > >> >>>> long as 6 hours > >> >>>> >>>>>>> from PST 9am > >> >>>> >>>>>>> > >> to > >> >>>> >>>>>>> > >> > 3pm > >> >>>> >>>>>>> > >> > > > > (let alone the time needed to > >> drain the > >> >>>> queue > >> >>>> >>>>>>> afterwards). > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > I think this has greatly > >> impacted our > >> >>>> productivity. > >> >>>> >>>> I've > >> >>>> >>>>>>> > >> experienced > >> >>>> >>>>>>> > >> > > that > >> >>>> >>>>>>> > >> > > > > PRs submitted in the early > >> morning of > >> >>>> PST time zone > >> >>>> >>>>>>> won't finish > >> >>>> >>>>>>> > >> > their > >> >>>> >>>>>>> > >> > > > > build until late night of the > >> same day. > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > So my questions are: > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > - Has anyone else experienced > >> the same > >> >>>> problem or > >> >>>> >>>>>>> have similar > >> >>>> >>>>>>> > >> > > > observation > >> >>>> >>>>>>> > >> > > > > on TravisCI? (I suspect it > >> has things > >> >>>> to do with > >> >>>> >> time > >> >>>> >>>>>>> zone) > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > - What pricing plan of > >> TravisCI is > >> >>>> Flink currently > >> >>>> >>>>>>> using? Is it > >> >>>> >>>>>>> > >> the > >> >>>> >>>>>>> > >> > > free > >> >>>> >>>>>>> > >> > > > > plan for open source > >> projects? What > >> >>>> are the > >> >>>> >>>>>>> guaranteed build > >> >>>> >>>>>>> > >> capacity > >> >>>> >>>>>>> > >> > > of > >> >>>> >>>>>>> > >> > > > > the current plan? > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > - If the current pricing plan > >> (either > >> >>>> free or paid) > >> >>>> >>>>>> can't > >> >>>> >>>>>>> > provide > >> >>>> >>>>>>> > >> > > stable > >> >>>> >>>>>>> > >> > > > > build capacity, can we > >> upgrade to a > >> >>>> higher priced > >> >>>> >>>>>>> plan with > >> >>>> >>>>>>> > larger > >> >>>> >>>>>>> > >> > and > >> >>>> >>>>>>> > >> > > > more > >> >>>> >>>>>>> > >> > > > > stable build capacity? > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > BTW, another factor that > >> contribute to > >> >>>> the > >> >>>> >>>>>>> productivity problem > >> >>>> >>>>>>> > is > >> >>>> >>>>>>> > >> > that > >> >>>> >>>>>>> > >> > > > > our build is slow - we run > >> full build > >> >>>> for every PR > >> >>>> >>>> and a > >> >>>> >>>>>>> > >> successful > >> >>>> >>>>>>> > >> > > full > >> >>>> >>>>>>> > >> > > > > build takes ~5h. We > >> definitely have > >> >>>> more options to > >> >>>> >>>>>>> solve it, > >> >>>> >>>>>>> > for > >> >>>> >>>>>>> > >> > > > instance, > >> >>>> >>>>>>> > >> > > > > modularize the build graphs > >> and reuse > >> >>>> artifacts > >> >>>> >> from > >> >>>> >>>> the > >> >>>> >>>>>>> > previous > >> >>>> >>>>>>> > >> > > build. > >> >>>> >>>>>>> > >> > > > > But I think that can be a big > >> effort > >> >>>> which is much > >> >>>> >>>>>>> harder to > >> >>>> >>>>>>> > >> > accomplish > >> >>>> >>>>>>> > >> > > > in > >> >>>> >>>>>>> > >> > > > > a short period of time and > >> may deserve > >> >>>> its own > >> >>>> >>>> separate > >> >>>> >>>>>>> > >> discussion. > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > [1] > >> >>>> >> https://travis-ci.org/apache/flink/pull_requests > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > > >> >>>> >>>>>>> > >> > > > > >> >>>> >>>>>>> > >> > > > >> >>>> >>>>>>> > >> > > >> >>>> >>>>>>> > >> > >> >>>> >>>>>>> > > > >> >>>> >>>>>>> > > >> >>>> >>>>>>> > >> >>>> >>>>>>> > >> >>>> >>>>>>> -- > >> >>>> >>>>>>> Best Regards > >> >>>> >>>>>>> > >> >>>> >>>>>>> Jeff Zhang > >> >>>> >>>>>>> > >> >>>> >> > >> >>>> > >> >>> > >> >> > >> > > > > > >