Timothy Potter created SOLR-9050:
------------------------------------

             Summary: IndexFetcher not retrying after SocketTimeoutException 
correctly, which leads to trying a full download again
                 Key: SOLR-9050
                 URL: https://issues.apache.org/jira/browse/SOLR-9050
             Project: Solr
          Issue Type: Bug
          Components: replication (java)
    Affects Versions: 5.3.1
            Reporter: Timothy Potter
            Assignee: Timothy Potter


I'm seeing a problem where reading a large file from the leader (in SolrCloud 
mode) during index replication leads to a SocketTimeoutException:

{code}
2016-04-28 16:22:23.568 WARN  (RecoveryThread-foo_shard11_replica2) [c:foo 
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.IndexFetcher Error in 
fetching file: _405k.cfs (downloaded 7314866176 of 9990844536 bytes)
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:150)
        at java.net.SocketInputStream.read(SocketInputStream.java:121)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
        at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
        at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
        at 
org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
        at 
org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
        at 
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
        at 
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
        at 
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
        at 
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
        at 
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:140)
        at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:167)
        at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:161)
        at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1312)
        at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1275)
        at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
{code}

and this leads to the following error in cleanup:

{code}
2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo 
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.ReplicationHandler 
Index fetch failed :org.apache.solr.common.SolrException: Unable to download 
_405k.cfs completely. Downloaded 7314866176!=9990844536
        at 
org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1406)
        at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1286)
        at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:423)
        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
        at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
        at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
        at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)

2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo 
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.c.RecoveryStrategy Error 
while trying to recover:org.apache.solr.common.SolrException: Replication for 
recovery failed.
        at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:165)
        at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
{code}

So a simple read timeout exception leads to re-downloading the whole index 
again, and again, and again ...

It also looks like any exception raised in fetchPackets would be squelched if 
an exception is raised in cleanup (called in the finally block)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to