[ 
https://issues.apache.org/jira/browse/SOLR-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359227#comment-14359227
 ] 

Dawid Weiss commented on SOLR-7215:
-----------------------------------

Uncomment the ThreadLeakFilters, Hoss. Nothing should get through. 
SolrIgnoredThreadsFilter has way too many exclusions -- these have to be shut 
down and cleaned properly, not ignored (leading to errors like this one):
{code}
    /*
     * IMPORTANT! IMPORTANT!
     * 
     * Any threads added here should have ABSOLUTELY NO SIDE EFFECTS
     * (should be stateless). This includes no references to cores or other
     * test-dependent information.
     */

    String threadName = t.getName();
    if (threadName.equals(TimerThread.THREAD_NAME)) {
      return true;
    }

    if (threadName.startsWith("facetExecutor-") || 
        threadName.startsWith("cmdDistribExecutor-") ||
        threadName.startsWith("httpShardExecutor-")) {
      return true;
    }
    
    // This is a bug in ZooKeeper where they call System.exit(11) when
    // this thread receives an interrupt signal.
    if (threadName.startsWith("SyncThread")) {
      return true;
    }

    // THESE ARE LIKELY BUGS - these threads should be closed!
    if (threadName.startsWith("Overseer-") ||
        threadName.startsWith("aliveCheckExecutor-") ||
        threadName.startsWith("concurrentUpdateScheduler-")) {
      return true;
    }

    return false;
{code}

> non reproducible Suite failures due to excessive sysout due to HDFS lease 
> renewal WARN logs due to connection refused -- even if test doesn't use HDFS 
> (ie: threads leaking between tests)
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7215
>                 URL: https://issues.apache.org/jira/browse/SOLR-7215
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: tests-report.txt_suite-failure-due-to-sysout.txt.zip
>
>
> On my local machine, i've noticed lately a lot of sporadic, non reproducible, 
> failures like these...
> {noformat}
>   2> NOTE: reproduce with: ant test  -Dtestcase=ScriptEngineTest 
> -Dtests.seed=E254A7E69EC7212A -Dtests.slow=true -Dtests.locale=sv 
> -Dtests.timezone=SystemV/CST6 -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> [14:34:23.749] ERROR   0.00s J1 | ScriptEngineTest (suite) <<<
>    > Throwable #1: java.lang.AssertionError: The test or suite printed 10984 
> bytes to stdout and stderr, even though the limit was set to 8192 bytes. 
> Increase the limit with @Limit, ignore it completely with 
> @SuppressSysoutChecks or run with -Dtests.verbose=true
>    >  at __randomizedtesting.SeedInfo.seed([E254A7E69EC7212A]:0)
>    >  at 
> org.apache.lucene.util.TestRuleLimitSysouts.afterIfSuccessful(TestRuleLimitSysouts.java:212)
> {noformat}
> Invariably, looking at the logs of test that fail for this reason, i see 
> multiple instances of these WARN msgs...
> {noformat}
>   2> 601361 T3064 oahh.LeaseRenewer.run WARN Failed to renew lease for 
> [DFSClient_NONMAPREDUCE_-253604438_2947] for 92 seconds.  Will retry shortly 
> ... java.net.ConnectException: Call From frisbee/127.0.1.1 to localhost:40618 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   2>  at sun.reflect.GeneratedConstructorAccessor268.newInstance(Unknown 
> Source)
>   2>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  ...
> {noformat}
> ...the full stack traces of these exceptions typically being 36 lines long 
> (not counting the supressed "... 17 more" at the end)
> doing some basic crunching of the "tests-report.txt" file from a recent run 
> of all "solr-core" tests (that caused the above failure) leads to some pretty 
> damn disconcerting numbers...
> {noformat}
> hossman@frisbee:~/tmp$ wc -l tests-report.txt_suite-failure-due-to-sysout.txt
> 1049177 tests-report.txt_suite-failure-due-to-sysout.txt
> hossman@frisbee:~/tmp$ grep "Suite: org.apache.solr" 
> tests-report.txt_suite-failure-due-to-sysout.txt | wc -l
> 465
> hossman@frisbee:~/tmp$ grep "LeaseRenewer.run WARN Failed to renew lease" 
> tests-report.txt_suite-failure-due-to-sysout.txt | grep 
> http://wiki.apache.org/hadoop/ConnectionRefused | wc -l
> 1988
> hossman@frisbee:~/tmp$ calc
> 1988 * 36
> 71568
> {noformat}
> So running 465 Solr test suites, we got ~2 thousand of these "Failed to renew 
> lease" WARNings.  Of the ~1 million total lines of log messages from all 
> tests, ~70 thousand (~7%) are coming from these WARNing mesages -- which can 
> evidently be safetly ignored?
> Something seems broken here.
> Someone who understands this area of the code should either:
> * investigate & fix the code/test not to have these lease renewal problems
> * tweak our test logging configs to supress these WARN messages



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to