Bojan Smid created SOLR-5691:
--------------------------------

             Summary: Unsynchronized WeakHashMap in SolrDispatchFilter causing 
issues in SolrCloud
                 Key: SOLR-5691
                 URL: https://issues.apache.org/jira/browse/SOLR-5691
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.6.1
            Reporter: Bojan Smid


I have a large SolrCloud setup, 7 nodes, each hosting few 1000 cores 
(leaders/replicas of same shard exist on different nodes), which is maybe 
making it easier to notice the problem.

Node can randomly get into a state where it "stops" responding to PeerSync /get 
requests from other nodes. When that happens, threaddump of that node shows 
multiple entries like this one (one entry for each "blocked" request from other 
node; they don't go away with time):

"http-bio-8080-exec-1781" daemon prio=5 tid=0x440177200000 nid=0x25ae  [ JVM 
locked by VM at safepoint, polling bits: safep ]
   java.lang.Thread.State: RUNNABLE
        at java.util.WeakHashMap.get(WeakHashMap.java:471)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)

WeakHashMap's internal state can easily get corrupted when used in 
unsynchronized way, in which case it is known to enter infinite loop in .get() 
call. It is very likely that this happens here too. The reason why other maybe 
don't see this issue could be related to huge number of cores I have in this 
system. The problem is usually created when some node is starting. Also, it 
doesn't happen with each start, it obviously depends on "correct" timing of 
events which lead to map's corruption.

The fix may be as simple as changing:

protected final Map<SolrConfig, SolrRequestParsers> parsers = new 
WeakHashMap<SolrConfig, SolrRequestParsers>();

to:

  protected final Map<SolrConfig, SolrRequestParsers> parsers = 
Collections.synchronizedMap(
      new WeakHashMap<SolrConfig, SolrRequestParsers>());

but there may be performance considerations around this since it is entrance 
into Solr.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to