[
https://issues.apache.org/jira/browse/SOLR-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419087#comment-17419087
]
Mark Robert Miller edited comment on SOLR-15644 at 9/23/21, 9:40 AM:
---------------------------------------------------------------------
I’m not sure where I’m being offensive. I don’t expect you to go fix all the
Solr issues, it’s a ridiculous amount of time and effort, I can tell you Uwe
isn’t going to do it either. And it’s not a knock on the test framework either.
If I thought it should have to deal with Solr and all of its random
dependencies and it’s problems in an ideal way for every case that Solr faces I
would say it should be changed. I haven’t found anything I’ve tried to change,
I use it as intended - it catches bad behavior, and if I can address that
behavior I do. And you can in 100s of cases. And in many fewer cases you either
cannot, or the effort is too large. For instances, many things can be address
by making Solr handle interrupts properly and it making it close/shutdown
properly. Both are absolutely huge endeavors, I have done it, it would still
take forever to repeat.
I won’t get into all the issues because most of them apply when the tests
milliseconds to seconds. When most of the tests are 10s of seconds to minutes,
I don’t really care about the performance of moving from test to test. I’ll
wait, linger, do rounds of interrupts, whatever. The broad linger and other
waits are still detrimental because you lose the value of the framework letting
you know what’s not right - things can be added now that cause almost every
test to linger the full 10 seconds and then get interrupted and no one would
even bat an eye or notice. But even that is not a big deal in the current world
of things.
The only thing I have to deal with in this world is cases where you remove some
of the exceptions and slowness of a test and it has objects / threads that you
don’t want carrying on into other tests, but with some layers removed, the test
framework will interrupt them, they won’t stop in time, and it will fail the
test run. But I can stop them and not fail the run and do it relatively
quickly. Not with any broad approach, but specifically for that problem
resource.
The other items are no longer very interesting to me, but that are various
cases. Sometimes there are items where if you just wait a very short time, it
will close it very quickly, but if you hit it with an interrupt it will take
much longer.
There are cases with the overseer where if you hit it with an interrupt you
may poke the bear and find it almost impossible to stop. Other cases where an
interrupt gets you out of one layer of third party code, but not fully out and
another quick interrupt or two will get you out. All of these cases are very
individualized to mostly isolated objects / dependencies and if I’m doing
things well, I don’t want any kind of broad behavior to deal with them - I want
the framework to tightly control and fail everything I don’t have to
individually work around as a last resort.
Anyway, 1000 of issues are there, it just depends on what you care about and
what affects you. You could look at the big integration tests and say they are
heavy, and so crank down the number of test jvms and say, Solr tests are often
heavy, use less jvms than Lucene. And that might be the end of it and Solr will
still search your data. You could also look at those test jvms when you start
up 15 at once and see most of the heavy tests sitting there using 0-3% of cpu
most of them time. You could then look into that and find 1000 things that if
addressed make those tests run as fast or faster than tight no dependency
Lucene tests. If you took the former approach, maybe there are not 1000
problems, the tests are passing and you are searching your data. The second
approach, you have a slightly different system when those huge integration
tests are giving Lucene a hard time and actually using cpu and moving and
exposing actual issues that are never seen when they sit around mostly hanging
out. 1000 issues from one angle, no major issue the other. Which perspective
I’m seeing depends on if I’m just collecting a pay check or want to be honest
about what’s going on in front of me. Whether it’s a tangential piece of
software in my daily life or a core piece.
was (Author: markrmiller):
I’m not sure where I’m being offensive. I don’t expect you to go fix all the
Solr issues, it’s a ridiculous amount of time and effort, I can tell you Uwe
isn’t going to do it either. And it’s not a knock on the test framework either.
If I thought it should have to deal with Solr and all of its random
dependencies and it’s problems in an ideal way for every case that Solr faces I
would say it should be changed. I haven’t found anything I’ve tried to change,
I use it as intended - it catches bad behavior, and if I can’t address that
behavior I do. And you can in 100s of cases. And in many fewer cases you either
cannot, or the effort is too large. For instances, many things can be address
by making Solr handle interrupts properly and it making it close/shutdown
properly. Both are absolutely huge endeavors, I have done it, it would still
take forever to repeat.
I won’t get into all the issues because most of them apply when the tests
milliseconds to seconds. When most of the tests are 10s of seconds to minutes,
I don’t really care about the performance of moving from test to test. I’ll
wait, linger, do rounds of interrupts, whatever. The broad linger and other
waits are still detrimental because you lose the value of the framework letting
you know what’s not right - things can be added now that cause almost every
test to linger the full 10 seconds and then get interrupted and no one would
even bat an eye or notice. But even that is not a big deal in the current world
of things.
The only thing I have to deal with in this world is cases where you remove some
of the exceptions and slowness of a test and it has objects / threads that you
don’t want carrying on into other tests, but with some layers removed, the test
framework will interrupt them, they won’t stop in time, and it will fail the
test run. But I can stop them and not fail the run and do it relatively
quickly. Not with any broad approach, but specifically for that problem
resource.
The other items are no longer very interesting to me, but that are various
cases. Sometimes there are items where if you just wait a very short time, it
will close it very quickly, but if you hit it with an interrupt it will take
much longer.
There are cases with the overseer where if you hit it with an interrupt you
may poke the bear and find it almost impossible to stop. Other cases where an
interrupt gets you out of one layer of third party code, but not fully out and
another quick interrupt or two will get you out. All of these cases are very
individualized to mostly isolated objects / dependencies and if I’m doing
things well, I don’t want any kind of broad behavior to deal with them - I want
the framework to tightly control and fail everything I don’t have to
individually work around as a last resort.
Anyway, 1000 of issues are there, it just depends on what you care about and
what affects you. You could look at the big integration tests and say they are
heavy, and so crank down the number of test jvms and say, Solr tests are often
heavy, use less jvms than Lucene. And that might be the end of it and Solr will
still search your data. You could also look at those test jvms when you start
up 15 at once and see most of the heavy tests sitting there using 0-3% of cpu
most of them time. You could then look into that and find 1000 things that if
addressed make those tests run as fast or faster than tight no dependency
Lucene tests. If you took the former approach, maybe there are not 1000
problems, the tests are passing and you are searching your data. The second
approach, you have a slightly different system when those huge integration
tests are giving Lucene a hard time and actually using cpu and moving and
exposing actual issues that are never seen when they sit around mostly hanging
out. 1000 issues from one angle, no major issue the other. Which perspective
I’m seeing depends on if I’m just collecting a pay check or want to be honest
about what’s going on in front of me. Whether it’s a tangential piece of
software in my daily life or a core piece.
> Add the ability to interrupt and wait for threads for problematic tests.
> ------------------------------------------------------------------------
>
> Key: SOLR-15644
> URL: https://issues.apache.org/jira/browse/SOLR-15644
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Tests
> Reporter: Mark Robert Miller
> Assignee: Mark Robert Miller
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The stuff in the test framework is slow and lacks control. For problematic
> tests, you don't want to linger first and you want fine control around
> interrupting - interrupting with a sledgehammer approach can actually make
> things take longer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]