Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Duo Zhang Tue, 11 Aug 2020 08:09:10 -0700

Some updates here, we have migrated most of the jobs to ci-hadoop.a.o.

There is a known issue that our flaky dashboard is broken, due to this new
feature of jenkins


https://wiki.jenkins.io/display/JENKINS/Configuring+Content+Security+Policy

Josh is contacting the infra team to see if they can relax the policy but I
do not think it is easy as the policy is per site, not per job...

Anyway, there is a chrome plugin to temporarily disable CSP, so you can
view the correct flaky dashboard.

https://chrome.google.com/webstore/detail/disable-content-security/ieelmcmcagommplceebfedjlakkhpden

Thanks.

Andor Molnar <an...@apache.org> 于2020年7月30日周四 下午3:12写道：

> https://issues.apache.org/jira/browse/INFRA-20613
>
>
>
> > On 2020. Jul 30., at 1:47, 张铎(Duo Zhang) <palomino...@gmail.com> wrote:
> >
> > This never worked in the past...
> >
> > But it would be great if you can kick the infra team to get this done :)
> >
> > File an infra issue?
> >
> > Andor Molnar <an...@apache.org>于2020年7月29日 周三18:36写道：
> >
> >> You’re having the same issue with HBase Robot btw. At the end of console
> >> outputs:
> >>
> >> "Could not update commit status, please check if your scan credentials
> >> belong to a member of the organization or a collaborator of the
> repository
> >> and repo:status scope is selected”
> >>
> >> ...and shortly after that:
> >>
> >> "GitHub has been notified of this commit’s build result”
> >>
> >> Whatever does it mean.
> >>
> >> Andor
> >>
> >>
> >>
> >>> On 2020. Jul 29., at 11:57, Andor Molnar <an...@apache.org> wrote:
> >>>
> >>> Yep, we’ve finally received it. It’s done.
> >>>
> >>> Current issue is that Jenkins is unable to set Github build status.
> I’ve
> >> added repo:status permission, but it’s also asking to be member of the
> >> project/organization and not sure how to do that.
> >>>
> >>> Andor
> >>>
> >>>
> >>>
> >>>> On 2020. Jul 29., at 4:10, 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
> >>>>
> >>>> Seems you have already made it?
> >>>>
> >>>> Usually there are several moderators for the private list, you need to
> >> ask
> >>>> them to let the GitHub registration go through.
> >>>>
> >>>> Andor Molnar <an...@apache.org> 于2020年7月29日周三 上午1:03写道：
> >>>>
> >>>>> Thanks Duo, that’s very helpful.
> >>>>> I cannot set private@zookeeper as a verified e-mail address, because
> >> the
> >>>>> verification e-mail cannot be sent to the list. Isn’t that restricted
> >> for
> >>>>> members only (by default)?
> >>>>>
> >>>>> Andor
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 2020. Jul 28., at 3:15, 张铎(Duo Zhang) <palomino...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> Hi Andor,
> >>>>>>
> >>>>>> The Apache-HBase account is registered by me, using the
> private@hbase
> >>>>>> mailing list, so all the PMC members can maintain the password.
> >>>>>>
> >>>>>> I generated an access token and added it to our jenkins, so we can
> >> use it
> >>>>>> to post comments back to GitHub.
> >>>>>>
> >>>>>> I think you could do the same to register an Apache-ZooKeeper
> >> account? Or
> >>>>>> if you want  to use the hadoop-yetus account, you'd better ask the
> >> hadoop
> >>>>>> PMC members or Gavin to add the token to jenkins so you can use it.
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> Andor Molnar <an...@apache.org> 于2020年7月28日周二 上午3:56写道：
> >>>>>>
> >>>>>>> Hi Duo,
> >>>>>>>
> >>>>>>> I’m trying to create a similar job for Apache ZooKeeper, but
> >>>>> unfortunately
> >>>>>>> haven’t got too much help on the Apache builds@ list so far, so
> I’m
> >>>>>>> rather asking you if you don’t mind.
> >>>>>>>
> >>>>>>> First, how have you set up the Hbase Github account that you use in
> >> this
> >>>>>>> job to access the repo?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Andor
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 2020. Jul 27., at 2:22, 张铎(Duo Zhang) <palomino...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> The pre commit job has been migrated to c-hadoop.a.o.
> >>>>>>>>
> >>>>>>>> I have disabled periodical scan for the old job on builds.a.o, as
> we
> >>>>>>> still
> >>>>>>>> need to view the pre commit result on it do not delete for now.
> Will
> >>>>>>> delete
> >>>>>>>> it later, maybe after several weeks.
> >>>>>>>>
> >>>>>>>> The new job is here
> >>>>>>>>
> >>>>>>>>
> >> https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> 张铎(Duo Zhang) <palomino...@gmail.com> 于2020年7月25日周六 下午9:44写道：
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/5/console
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> We successfully finished a nightly build.
> >>>>>>>>>
> >>>>>>>>> But seems the jiraComment did not work. I haven't seen the
> comment
> >>>>>>>>> on HBASE-24757...
> >>>>>>>>>
> >>>>>>>>> 张铎(Duo Zhang) <palomino...@gmail.com> 于2020年7月25日周六 下午4:51写道：
> >>>>>>>>>
> >>>>>>>>>> After installing two new jenkins plugins, the pre commit job
> seems
> >>>>> fine
> >>>>>>>>>> now.
> >>>>>>>>>>
> >>>>>>>>>> The last failure is because of a timeout, I assume the problem
> is
> >>>>> that
> >>>>>>> we
> >>>>>>>>>> do not have enough executors so all the jobs are executed
> >>>>> sequentially.
> >>>>>>>>>>
> >>>>>>>>>> Maybe we could move the pre commit job to the new env first? The
> >>>>>>> nightly
> >>>>>>>>>> job and flaky job require more resources, and we need the output
> >> of
> >>>>>>> these
> >>>>>>>>>> jenkins jobs(the flaky test list).
> >>>>>>>>>>
> >>>>>>>>>> Thanks.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 张铎(Duo Zhang) <palomino...@gmail.com> 于2020年7月24日周五 下午4:36写道：
> >>>>>>>>>>
> >>>>>>>>>>> The problem seems because of this:
> >>>>>>>>>>>
> >>>>>>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-48556
> >>>>>>>>>>>
> >>>>>>>>>>> I triggered the job again, it passed the timestamps call, and
> >> will
> >>>>>>> keep
> >>>>>>>>>>> an eye on it.
> >>>>>>>>>>>
> >>>>>>>>>>> 张铎(Duo Zhang) <palomino...@gmail.com> 于2020年7月21日周二 上午11:18写道：
> >>>>>>>>>>>
> >>>>>>>>>>>> On the sponsors, we could have a try.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The problem here is the process of the donation? IIRC there
> is a
> >>>>>>> thread
> >>>>>>>>>>>> on the infra mailing list about how to donate machines to a
> >>>>> specific
> >>>>>>>>>>>> project and the discussion did not go well...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sean Busbey <bus...@apache.org> 于2020年7月21日周二 上午11:13写道：
> >>>>>>>>>>>>
> >>>>>>>>>>>>> We could check with ASF infra for the current state of things
> >> wrt
> >>>>>>>>>>>>> GitHub
> >>>>>>>>>>>>> actions. I believe there is a queue set up across ASF
> projects.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It has the same resource issue Travis had; things are fine
> >> until
> >>>>>>> some
> >>>>>>>>>>>>> critical mass of projects seeking better perf realize some
> new
> >>>>>>> option
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>> available and then quickly all available resources are
> >> consumed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> AFAICT the only option that gets us the same or better as the
> >> H*
> >>>>>>> nodes
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>> be finding sponsors and running our own.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 20, 2020, 21:55 张铎(Duo Zhang) <
> >> palomino...@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think our nightly, flakey, and pre commit jobs should be
> >>>>>>>>>>>>> transferred as a
> >>>>>>>>>>>>>> whole? They depend on each other.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I offer my help on the transition.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And on github CI, does ASF have a special deal with github?
> If
> >>>>> not,
> >>>>>>>>>>>>> I do
> >>>>>>>>>>>>>> not think the default resource can fit our requirements...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sean Busbey <bus...@apache.org> 于2020年7月21日周二 上午1:49写道：
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi folks!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Back in April there was a brief discussion[1] about ASF
> >> Infra's
> >>>>>>>>>>>>>>> notification that builds.a.o is going away and we are
> >> currently
> >>>>>>>>>>>>> slated
> >>>>>>>>>>>>>>> to migrate to a set of CI servers for "Hadoop and related
> >>>>>>>>>>>>> projects".
> >>>>>>>>>>>>>>> This is the ci farm that will contain the bulk of the H*
> >> worker
> >>>>>>>>>>>>> nodes
> >>>>>>>>>>>>>>> that are donated by Yahoo!, which are the nodes we've been
> >>>>> running
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> for ages[2].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Migration discussion still happens on the
> >>>>> hadoop-migrations@i.a.o
> >>>>>>>>>>>>>>> list[3] and recently ASF Infra set a target date of August
> >> 15th
> >>>>>>> for
> >>>>>>>>>>>>>>> turning off the existing builds.a.o server[4].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That gives us a little under 4 weeks to have things up and
> >>>>> working
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> the new ci-hadoop.a.o jenkins coordinator[5]. it’s not
> clear
> >> to
> >>>>> me
> >>>>>>>>>>>>>>> that the level of effort we’ll need to spend is worth what
> we
> >>>>> get
> >>>>>>>>>>>>> out
> >>>>>>>>>>>>>>> of a continuation of the status quo on builds.a.o. I did a
> >> quick
> >>>>>>>>>>>>> test
> >>>>>>>>>>>>>>> by updating the nightly job on ci-hadoop.a.o to run just
> >>>>> branch-2,
> >>>>>>>>>>>>>>> since that has been stable on builds.a.o. It failed with a
> >>>>> Jenkins
> >>>>>>>>>>>>>>> pipeline DSL syntax error[6] so I'm assuming migrating will
> >> be a
> >>>>>>>>>>>>> slog.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As far as I can see our options are:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Do nothing. Have no testing or automated website
> >> publication
> >>>>> in
> >>>>>>>>>>>>> mid
> >>>>>>>>>>>>>>> August.
> >>>>>>>>>>>>>>> * Transition website publication and nothing else (probably
> >> can
> >>>>> be
> >>>>>>>>>>>>>>> done in a day)
> >>>>>>>>>>>>>>> * Transition just precommit testing for various repos
> >> (probably
> >>>>>>>>>>>>> can be
> >>>>>>>>>>>>>>> done in a few days)
> >>>>>>>>>>>>>>> * Transition everything (no idea how long it takes due to
> >>>>> nightly,
> >>>>>>>>>>>>>>> flaky stuff, etc)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The alternatives if we do not transition any given job to
> >>>>>>>>>>>>> ci-hadoop:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> * Try to move to GitHub Actions
> >>>>>>>>>>>>>>> * Try to move to Travis CI
> >>>>>>>>>>>>>>> * Try to move to Jenkins infra we maintain ourselves
> >> (presumably
> >>>>>>> by
> >>>>>>>>>>>>>>> soliciting project specific donations for worker nodes on
> >> cloud
> >>>>>>>>>>>>>>> vendors)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It's important to remember that as a project we have a
> heavy
> >>>>>>>>>>>>> footprint
> >>>>>>>>>>>>>>> wherever our nightly tests run. For context, a given
> branch's
> >>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>> can keep 3-4 executors busy for 6+ hours on the current
> >>>>> builds.a.o
> >>>>>>>>>>>>>>> setup. There's been a bunch of great work lately on
> bringing
> >>>>> down
> >>>>>>>>>>>>> what
> >>>>>>>>>>>>>>> it takes to run the full test suite, but applying that work
> >> to
> >>>>>>>>>>>>> nightly
> >>>>>>>>>>>>>>> is itself a significant undertaking.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What are folks thinking? Most importantly who is ready to
> >> work
> >>>>>>>>>>>>> towards
> >>>>>>>>>>>>>>> any given approach?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1] [DISCUSS] Migrating HBase to new CI Master
> >>>>>>>>>>>>>>> https://s.apache.org/fux1o
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [2] https://builds.apache.org/view/H-L/view/HBase/
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [3]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> >> https://lists.apache.org/list.html?hadoop-migrati...@infra.apache.org
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [4] [IMPORTANT] - 2 more HADOOP nodes migrated over to
> >> ci-hadoop
> >>>>>>>>>>>>>>> https://s.apache.org/7e1nq
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [5] https://ci-hadoop.apache.org/job/HBase/
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [6]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>
> >>>>>
> >>
> https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/2/console
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] we need to take action if we want asf jenkins managed tests after Aug 15 2020.

Reply via email to