[jira] [Commented] (SOLR-6969) Just like we have to retry when the NameNode is in safemode on Solr startup, we also need to retry when opening a transaction log file for append when we get a RecoveryInProgressException.

Mike Drob (JIRA) Wed, 21 Jan 2015 15:00:08 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286485#comment-14286485
 ]


Mike Drob commented on SOLR-6969:
---------------------------------

Is retrying always going to be safe? That works fine after we've lost a server 
and started a new one (albeit too quickly) but what about the case where two 
servers both think they are responsible for that tlog? This can happen if the 
original server partially dies, but still has some threads that are doing work 
and haven't been cleaned up.

Looking at how other projects handle similar issues - HBase moves the entire 
directory[1] to break any existing leases and ensure any other processes gets 
kicked out. Maybe a retry is a good stop-gap, but is it going to be a full 
solution?

[1]: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L310

> Just like we have to retry when the NameNode is in safemode on Solr startup, 
> we also need to retry when opening a transaction log file for append when we 
> get a RecoveryInProgressException.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6969
>                 URL: https://issues.apache.org/jira/browse/SOLR-6969
>             Project: Solr
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 5.0, Trunk
>
>
> This can happen after a hard crash and restart. The current workaround is to 
> stop and wait it out and start again. We should retry and wait a given amount 
> of time as we do when we detect safe mode though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6969) Just like we have to retry when the NameNode is in safemode on Solr startup, we also need to retry when opening a transaction log file for append when we get a RecoveryInProgressException.

Reply via email to