[
https://issues.apache.org/jira/browse/CASSANDRA-17563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525883#comment-17525883
]
David Capwell commented on CASSANDRA-17563:
-------------------------------------------
bq. has only an addition of a new job in_jvm which match as resource usage the
old single in_jvm job and no other changes applied.
Currently this is a massive amount of manual work to confirm, and there isn't a
good source of truth to compare against; this patch moves the source of truth
into a map so we know what happens (if you don't update you get default for the
level, else you add an override)
We talked about this a lot in slack, and my personal feeling is every step in a
manual process is another chance for error, so the more steps done the higher
risk to do it incorrectly; the current process as I can find is the following
1) create config-2_1.yml.MIDRES and config-2_1.yml.HIGHER using the current
patches
2) update config-2_1.yml with your change
3) update config-2_1.yml.MIDRES with your change, and figure out how to apply
the updated resources
4) update config-2_1.yml.HIGHER with your change and figure out how to apply
the updated resources (method does not match step 3, so the "how" is different
here)
5) generate diff for MIDRES and update patch
6) generate diff for HIGHER and update patch
7) test LOWER - success is defined as "what failed before is the only thing
failing now"
8) test MIDRES - success is defined as "what failed before is the only thing
failing now"
9) test HIGHER - success is defined as "what failed before is the only thing
failing now"
did I miss anything?
bq. There are changes to the other generate.sh script I haven't looked at but
any change there need to be tested
given there are 0 tests for the script, 2 different people "testing" could
yield different results, so we would need to have some way to define success
that is agreed upon.
For example, I fixed what I saw was a bug, when you ask it to generate LOWER,
MIDRES, or HIGHER it doesn't actually update those files and instead only
updates config.yml; the help page says this but to me this is unexpected
behavior
-a updates config.yml.LOWER, config.yml.MIDRES, and config.yml.HIGHER,
-h updates config.yml only!
I am totally cool with -h updating config.yml as well, but it feels like a bug
that config.yml.HIGHER isn't updated... so my patch changing that... which one
of us is the bug?
Now, if we want to define it as "they did the same thing regardless of personal
feelings about correct behavior" then I do know for a fact my patch is
different; I am 100% ok reverting that difference
bq. My concern is we don't know who was using what and how and it was working
fine for quite some time
I feel like a politician... can you define the word "what"?
bq. Do we want to rewrite the whole approach one week before freeze when people
highly utilize CI to push their latest work?
the core change is moving away from patch to modifying the yaml tree; there are
other changes but those are personal preference and 100% fine to drop...
To me I ask the following question "if you yaml diff the old and new files, are
there a difference?" if the answer is no, then there isn't much of a risk other
than the script not working on an unknown laptop (which impacts generate.sh
only, not CI configs).
Now, if you want to de-risk that, we could use this script to generate the
patches, but we don't solve the real problem of patches applying when they
shouldn't (which is how I broke MIDRES). If we want to do that to lower risk
before 4.1 freeze I am cool with that, but do not think that is a valid long
term solution
bq. now we will have a mix of python and shell scripts, are we sure the
community will accept that?
that is something anyone who touches this needs to answer, which is why I tried
to pull in anyone who touched this logic to get their feedback. I do know that
many in the community basically do this already (can tell by looking at circle
ci as my private scripts rename things and cleanup our DAG), so its just moving
part of that private logic into OSS to help maintain these files.
bq. I really like and appreciate how you added diff but I am confused from the
output what I am seeing actually. I see the new name and resource change.
do you mean the output of the dump of what each job's resource is?
{code}
$ diff midres.resources midres.resources.new
9a10
> j11_jvm_dtests_vnode medium 10
22a24
> j8_jvm_dtests_vnode large 10
{code}
so this is diff output, so ">" means that the right-hand-side has the
following, but there is no matching on the left hand side... aka "new job"
you see that MIDRES j8 and j11 do not have matching resources! This is because
they don't now (as defined before the vnode patch), so I am pointing out that
j11 and j8 run with different resources on MIDRES and that I am not changing
that behavior in this patch (more than glad to if you desire)
{code}
j11_jvm_dtests medium 10
j11_jvm_dtests_vnode medium 10
j8_jvm_dtests large 10
j8_jvm_dtests_vnode large 10
{code}
they match in HIGHER but not MIDRES, I do not know if there was a reason for
this before so I left it to be true still.
> Fix CircleCI Midres config
> --------------------------
>
> Key: CASSANDRA-17563
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17563
> Project: Cassandra
> Issue Type: Bug
> Components: CI
> Reporter: Ekaterina Dimitrova
> Priority: Normal
> Fix For: 4.1
>
>
> During CircleCI addition of a new job to the config, the midres file got
> messy. Two of the immediate issues (but we need to verify all jobs will use
> the right executors and resources):
> * the new job needs to use higher parallelism as the original in-jvm job
> * j8_dtests_with_vnodes should get from midres 50 large but currently
> midres makes it run with 25 and medium which fails around 100 tests
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]