Gregory Chanan created SOLR-7378:
------------------------------------

             Summary: Be more conservative about loading a core when hdfs 
transaction log could not be recovered
                 Key: SOLR-7378
                 URL: https://issues.apache.org/jira/browse/SOLR-7378
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 5.0
            Reporter: Gregory Chanan


Today, if an HdfsTransactionLog cannot recover its lease, you get the following 
warning in the log:

{code}
      log.warn("Cannot recoverLease after trying for " +
        conf.getInt("solr.hdfs.lease.recovery.timeout", 900000) +
        "ms (solr.hdfs.lease.recovery.timeout); continuing, but may be 
DATALOSS!!!; " +
        getLogMessageDetail(nbAttempt, p, startWaiting));
{code}
from: 
https://github.com/apache/lucene-solr/blob/a8c24b7f02d4e4c172926d04654bcc007f6c29d2/solr/core/src/java/org/apache/solr/util/FSHDFSUtils.java#L145-L148

But some deployments may not actually want to continue if there is potential 
data loss, they may want to investigate what the underlying issue is with HDFS 
first.  And there's no way outside of looking at the logs to figure out what is 
going on.

There's a range of possibilties here, but here's a couple of ideas:
1) config parameter around whether to continue with potential data loss or not
2) load but require special flag to read potentially incorrect data (similar to 
 shards.tolerant, data.tolerant or something?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to