[ 
https://issues.apache.org/jira/browse/SOLR-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihaly Toth updated SOLR-8335:
------------------------------
    Attachment: SOLR-8335.patch

Attaching the proposal. 2 further notes beside my comments above:
* If lock is lost (taken over by newly started node while this node was unable 
to update it) LockValidatingDirectoryWrapper already solves potential problems 
by calling HdfsLock.ensureValid() function
* This is an alternative to zookeeper based locking. This lock is available 
exactly under the same conditions as the file to be written.

The changes are also available under [my clone's branch| 
https://github.com/mihalytoth/lucene-solr/tree/master_hdfs_lock]

> HdfsLockFactory does not allow core to come up after a node was killed
> ----------------------------------------------------------------------
>
>                 Key: SOLR-8335
>                 URL: https://issues.apache.org/jira/browse/SOLR-8335
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.2.1, 5.3, 5.3.1
>            Reporter: Varun Thacker
>         Attachments: SOLR-8335.patch
>
>
> When using HdfsLockFactory if a node gets killed instead of a graceful 
> shutdown the write.lock file remains in HDFS . The next time you start the 
> node the core doesn't load up because of LockObtainFailedException .
> I was able to reproduce this in all 5.x versions of Solr . The problem wasn't 
> there when I tested it in 4.10.4
> Steps to reproduce this on 5.x
> 1. Create directory in HDFS : {{bin/hdfs dfs -mkdir /solr}}
> 2. Start Solr: {{bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory 
> -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://localhost:9000/solr 
> -Dsolr.updatelog=hdfs://localhost:9000/solr}}
> 3. Create core: {{./bin/solr create -c test -n data_driven}}
> 4. Kill solr
> 5. The lock file is there in HDFS and is called {{write.lock}}
> 6. Start Solr again and you get a stack trace like this:
> {code}
> 2015-11-23 13:28:04.287 ERROR (coreLoadExecutor-6-thread-1) [   x:test] 
> o.a.s.c.CoreContainer Error creating core [test]: Index locked for write for 
> core 'test'. Solr now longer supports forceful unlocking via 
> 'unlockOnStartup'. Please verify locks manually!
> org.apache.solr.common.SolrException: Index locked for write for core 'test'. 
> Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please 
> verify locks manually!
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659)
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723)
>         at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443)
>         at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked 
> for write for core 'test'. Solr now longer supports forceful unlocking via 
> 'unlockOnStartup'. Please verify locks manually!
>         at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761)
>         ... 9 more
> 2015-11-23 13:28:04.289 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] 
> o.a.s.c.CoreContainer Error waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core [test]
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:472)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core [test]
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:737)
>         at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443)
>         at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434)
>         ... 5 more
> Caused by: org.apache.solr.common.SolrException: Index locked for write for 
> core 'test'. Solr now longer supports forceful unlocking via 
> 'unlockOnStartup'. Please verify locks manually!
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659)
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723)
>         ... 7 more
> Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked 
> for write for core 'test'. Solr now longer supports forceful unlocking via 
> 'unlockOnStartup'. Please verify locks manually!
>         at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761)
>         ... 9 more
> {code}
> In 4.10.4 I saw these two differences
> 1. The lock file name was different . It's something like : 
> {{/solr/index/HdfsDirectory@46ad6bd3 
> lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@4b44b5f6-write.lock}}
> 2. When the node is started again after it was killed , it loaded up the core 
> just fine but there were two lock files in hdfs now . 4b44b5f6-write.lock is 
> the latest one
> {code}
> /solr/index/HdfsDirectory@46ad6bd3 
> lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@4b44b5f6-write.lock
> /solr/index/HdfsDirectory@52959724 
> lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@9d59d3f-write.lock
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to