[ https://issues.apache.org/jira/browse/SOLR-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Risden resolved SOLR-7215. -------------------------------- Resolution: Fixed Resolving as most likely fixed somewhere along the way since its been 3 years since last comment. > non reproducible Suite failures due to excessive sysout due to HDFS lease > renewal WARN logs due to connection refused -- even if test doesn't use HDFS > (ie: threads leaking between tests) > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SOLR-7215 > URL: https://issues.apache.org/jira/browse/SOLR-7215 > Project: Solr > Issue Type: Bug > Reporter: Hoss Man > Priority: Major > Attachments: tests-report.txt_suite-failure-due-to-sysout.txt.zip > > > On my local machine, i've noticed lately a lot of sporadic, non reproducible, > failures like these... > {noformat} > 2> NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest > -Dtests.seed=E254A7E69EC7212A -Dtests.slow=true -Dtests.locale=sv > -Dtests.timezone=SystemV/CST6 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > [14:34:23.749] ERROR 0.00s J1 | ScriptEngineTest (suite) <<< > > Throwable #1: java.lang.AssertionError: The test or suite printed 10984 > bytes to stdout and stderr, even though the limit was set to 8192 bytes. > Increase the limit with @Limit, ignore it completely with > @SuppressSysoutChecks or run with -Dtests.verbose=true > > at __randomizedtesting.SeedInfo.seed([E254A7E69EC7212A]:0) > > at > org.apache.lucene.util.TestRuleLimitSysouts.afterIfSuccessful(TestRuleLimitSysouts.java:212) > {noformat} > Invariably, looking at the logs of test that fail for this reason, i see > multiple instances of these WARN msgs... > {noformat} > 2> 601361 T3064 oahh.LeaseRenewer.run WARN Failed to renew lease for > [DFSClient_NONMAPREDUCE_-253604438_2947] for 92 seconds. Will retry shortly > ... java.net.ConnectException: Call From frisbee/127.0.1.1 to localhost:40618 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > 2> at sun.reflect.GeneratedConstructorAccessor268.newInstance(Unknown > Source) > 2> at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > ... > {noformat} > ...the full stack traces of these exceptions typically being 36 lines long > (not counting the supressed "... 17 more" at the end) > doing some basic crunching of the "tests-report.txt" file from a recent run > of all "solr-core" tests (that caused the above failure) leads to some pretty > damn disconcerting numbers... > {noformat} > hossman@frisbee:~/tmp$ wc -l tests-report.txt_suite-failure-due-to-sysout.txt > 1049177 tests-report.txt_suite-failure-due-to-sysout.txt > hossman@frisbee:~/tmp$ grep "Suite: org.apache.solr" > tests-report.txt_suite-failure-due-to-sysout.txt | wc -l > 465 > hossman@frisbee:~/tmp$ grep "LeaseRenewer.run WARN Failed to renew lease" > tests-report.txt_suite-failure-due-to-sysout.txt | grep > http://wiki.apache.org/hadoop/ConnectionRefused | wc -l > 1988 > hossman@frisbee:~/tmp$ calc > 1988 * 36 > 71568 > {noformat} > So running 465 Solr test suites, we got ~2 thousand of these "Failed to renew > lease" WARNings. Of the ~1 million total lines of log messages from all > tests, ~70 thousand (~7%) are coming from these WARNing mesages -- which can > evidently be safetly ignored? > Something seems broken here. > Someone who understands this area of the code should either: > * investigate & fix the code/test not to have these lease renewal problems > * tweak our test logging configs to supress these WARN messages -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org