Timothy Potter created SOLR-9050:
------------------------------------
Summary: IndexFetcher not retrying after SocketTimeoutException
correctly, which leads to trying a full download again
Key: SOLR-9050
URL: https://issues.apache.org/jira/browse/SOLR-9050
Project: Solr
Issue Type: Bug
Components: replication (java)
Affects Versions: 5.3.1
Reporter: Timothy Potter
Assignee: Timothy Potter
I'm seeing a problem where reading a large file from the leader (in SolrCloud
mode) during index replication leads to a SocketTimeoutException:
{code}
2016-04-28 16:22:23.568 WARN (RecoveryThread-foo_shard11_replica2) [c:foo
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.IndexFetcher Error in
fetching file: _405k.cfs (downloaded 7314866176 of 9990844536 bytes)
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at
org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at
org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:140)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:167)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:161)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1312)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1275)
at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
{code}
and this leads to the following error in cleanup:
{code}
2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.h.ReplicationHandler
Index fetch failed :org.apache.solr.common.SolrException: Unable to download
_405k.cfs completely. Downloaded 7314866176!=9990844536
at
org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1406)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1286)
at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:800)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:423)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:380)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:162)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
2016-04-28 16:26:04.332 ERROR (RecoveryThread-foo_shard11_replica2) [c:foo
s:shard11 r:core_node139 x:foo_shard11_replica2] o.a.s.c.RecoveryStrategy Error
while trying to recover:org.apache.solr.common.SolrException: Replication for
recovery failed.
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:165)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:437)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
{code}
So a simple read timeout exception leads to re-downloading the whole index
again, and again, and again ...
It also looks like any exception raised in fetchPackets would be squelched if
an exception is raised in cleanup (called in the finally block)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]