Great analysis Damian thanks for taking a look and fixing this. Great
to know it was not anything related to Beam's code.

I wonder if we should probably change the input size for the open
source runners (currently is 1/10 of Dataflow, that explains the big
difference on time), with the goal of detecting regressions better,
the current size is so small that adding 1s of extra time in some runs
looks like a 50-60% degradation and we cannot know if this is due to
some small small CPU/GC pause or a real regression. I wonder however
if this will impact negatively the worker utilization.


On Mon, Jul 27, 2020 at 4:07 PM Damian Gadomski
<damian.gadom...@polidea.com> wrote:
>
> Hey all,
>
> I've done a few checks to pinpoint the issue and it seems that I've just 
> fixed it.
>
> Didn't know that before but the Flink, Spark and Direct Nexmark tests are 
> running on special Jenkins worker. The `apache-beam-jenkins-16` is labeled 
> with `beam-perf`, so only these tests can execute there. I'm not sure, 
> because the configuration on the old CI is already gone, but I guess that 
> this worker was configured to have only one executor (which I had missed). 
> That would forbid concurrent execution of the jobs and improve/stabilize the 
> timings.
>
> That's how I currently configured the node and seems that the timings are 
> back to the pre-migration values: 
> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=no:w-90d&to=now
>
> Dataflow was not affected because it wasn't restricted to run on 
> `apache-beam-jenkins-16`.
>
> Regards,
> Damian
>
>
> On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles <k...@apache.org> wrote:
>>
>> Are Spark and Flink runners benchmarking against local clusters on the 
>> Jenkins VMs? Needless to say that is not a very controlled environment (and 
>> of course not realistic scale). That is probably why Dataflow was not 
>> affected. Is it possible that simply the different version of the Jenkins 
>> worker software and/or the instructions from the Cloudbees instance cause 
>> differing load?
>>
>> Kenn
>>
>> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev <valen...@google.com> 
>> wrote:
>>>
>>> FYI it looks like the transition to new Jenkins CI is visible on Nexmark 
>>> performance graphs[1][2]. Are new VM nodes less performant than old ones?
>>>
>>> [1] 
>>> hhttp://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=1587597387737&to=1595373387737&var-processingType=batch&var-ID=All&var-runner=All
>>> [2] 
>>> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>>>
>>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton <tyso...@google.com> wrote:
>>>>
>>>> Currently no. We're already experiencing a backlog of builds so the 
>>>> additional load would be a problem. I've opened two related issues that I 
>>>> think need completion before allowing non-committers to trigger tests:
>>>>
>>>> Load sharing improvements: https://issues.apache.org/jira/browse/BEAM-10281
>>>> Admin access (maybe not required but nice to have): 
>>>> https://issues.apache.org/jira/browse/BEAM-10280
>>>>
>>>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track 
>>>> opening up triggering for non-committers.
>>>>
>>>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>> Was about to ask the same question, so can non-committers trigger the 
>>>>> tests now?
>>>>>
>>>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee <heej...@google.com> wrote:
>>>>>>
>>>>>> This is awesome. Could non-committers also trigger the test now?
>>>>>>
>>>>>> On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski 
>>>>>> <damian.gadom...@polidea.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Good news, we've just migrated to the new CI: 
>>>>>>> https://ci-beam.apache.org. As from now beam projects at 
>>>>>>> builds.apache.org are disabled.
>>>>>>>
>>>>>>> If you experience any issues with the new setup please let me know, 
>>>>>>> either here or on ASF slack.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Damian
>>>>>>>
>>>>>>> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski 
>>>>>>> <damian.gadom...@polidea.com> wrote:
>>>>>>>>
>>>>>>>> Happy to see your positive response :)
>>>>>>>>
>>>>>>>> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed 
>>>>>>>> it needs some attention.
>>>>>>>>
>>>>>>>> There are two things basing on my research:
>>>>>>>>
>>>>>>>> data uploaded by performance and load tests by the jobs, directly to 
>>>>>>>> the influx DB - that should be handled automatically as new jobs will 
>>>>>>>> upload the same data in the same way
>>>>>>>> data fetched using Jenkins API by the metrics tool (syncjenkins.py) - 
>>>>>>>> here the situation is a bit more complex as the script relies on the 
>>>>>>>> build number (it's used actually as a time reference and primary key 
>>>>>>>> in the DB is created from it). To avoid refactoring of the script and 
>>>>>>>> database migration to use timestamp instead of build number I've just 
>>>>>>>> "fast-forwarded" the numbers on the new https://ci-beam.apache.org to 
>>>>>>>> follow current numbering from the old CI. Therefore simple replacement 
>>>>>>>> of the Jenkins URL in the metrics scripts should do the trick to have 
>>>>>>>> continuous metrics data. I'll check that tomorrow on my local grafana 
>>>>>>>> instance.
>>>>>>>>
>>>>>>>> Please let me know if there's anything that I missed.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Damian
>>>>>>>>
>>>>>>>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko 
>>>>>>>> <aromanenko....@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Great! Thank you for working on this and letting us know.
>>>>>>>>>
>>>>>>>>> On 12 Jun 2020, at 16:58, Damian Gadomski 
>>>>>>>>> <damian.gadom...@polidea.com> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> During the last few days, I was preparing for the Beam Jenkins 
>>>>>>>>> migration from builds.apache.org to ci-beam.apache.org. The new 
>>>>>>>>> Jenkins Master will be dedicated only for Beam related jobs, all Beam 
>>>>>>>>> Committers will have build configure access, and Beam PMC will have 
>>>>>>>>> Admin (GUI) Access.
>>>>>>>>>
>>>>>>>>> We (in cooperation with Infra) are almost ready for the migration 
>>>>>>>>> itself and I want to share with you the details of our plan. We are 
>>>>>>>>> planning to start the migration next week, most likely on Tuesday. 
>>>>>>>>> I'll keep you updated on the progress. We do not expect any issues 
>>>>>>>>> nor the outage of the CI services, everything should be more or less 
>>>>>>>>> unnoticeable. Just don't be surprised that the Jenkins URL will 
>>>>>>>>> change to https://ci-beam.apache.org
>>>>>>>>>
>>>>>>>>> If you are curious, here are the steps that we are going to take:
>>>>>>>>>
>>>>>>>>> 1. Create 16 new CI nodes that will be connected to the new CI. We 
>>>>>>>>> will then have simultaneously running two CI servers.
>>>>>>>>> 2. Verify that new builds work as expected on the new instance 
>>>>>>>>> (compare results of cron builds). (a day or two would be sufficient)
>>>>>>>>> 3. Move the responsibility of Phrase/PR/Commit builds to the new CI, 
>>>>>>>>> disable on the old one.
>>>>>>>>> 4. Modify the .test-infra/jenkins/README.md to point to the new 
>>>>>>>>> instance and replace Post-commit tests status in README.md and 
>>>>>>>>> .github/PULL_REQUEST_TEMPLATE.md
>>>>>>>>> 5. Disable the jobs on the old Jenkins and add a description to each 
>>>>>>>>> job with the URL to the corresponding one on the new CI.
>>>>>>>>> 6. Turn off VM instances of the old nodes.
>>>>>>>>> 7. Remove VM instances of the old nodes.
>>>>>>>>>
>>>>>>>>> In case of any questions or doubts feel free to ask :)
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>>

Reply via email to