[ 
https://issues.apache.org/jira/browse/SOLR-11988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364894#comment-16364894
 ] 

Hoss Man commented on SOLR-11988:
---------------------------------


Below is an example of the type of execption that gets logged when this test 
fails, from the log.txt file I just attached (where 2 _different_ replicas had 
this same problem).

Note that this log.txt file was produced using the attached 
SOLR-11988_nocommit_logging.patch that increases the logging verbosity in 
{{SolrCore.initIndex(...)}} -- hence the "nocommit" log line, and the line 
numbers not matching up exactly with current master -- but the net result is 
the same: In {{SolrCore.initIndex(...)}} the DirectoryFactory claims that the 
index directory for this brand new, never before in existence SolrCore, already 
exists and doesn't need to be initialized.  This then causes a problem when we 
try to open the "real" IndexWriter against it (using OpenMode.APPEND because we 
expect it to already exist)...

{noformat}
$ ant test  -Dtestcase=FullSolrCloudDistribCmdsTest -Dtests.method=test 
-Dtests.seed=E6FD3BCDEA5D2094 -Dtests.slow=true -Dtests.locale=ar-JO 
-Dtests.timezone=Asia/Aqtobe -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
...
   [junit4]   2> 33926 INFO  (qtp1926432793-173) [n:127.0.0.1:60391_kg_fmt 
c:collection2 s:shard6  x:collection2_shard6_replica_n32] o.a.s.c.SolrCore 
[collection2_shard6_replica_n32] nocommit: skipping creation of 
'/home/hossman/lucene/dev/solr/build/solr-core/test/J0/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.cloud.FullSolrCloudDistribCmdsTest_E6FD3BCDEA5D2094-001/shard-4-001/cores/collection2_shard6_replica_n32/data/index/'
 (aka: 
'/home/hossman/lucene/dev/solr/build/solr-core/test/J0/../../../../../../../../../home/hossman/lucene/dev/solr/build/solr-core/test/J0/temp/solr.cloud.FullSolrCloudDistribCmdsTest_E6FD3BCDEA5D2094-001/shard-4-001/cores/collection2_shard6_replica_n32/data/index')
 because dirFac (org.apache.solr.core.MockDirectoryFactory@768117c) says it 
exists
...
   [junit4]   2> 34763 ERROR (qtp1926432793-173) [n:127.0.0.1:60391_kg_fmt 
c:collection2 s:shard6  x:collection2_shard6_replica_n32] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
CREATEing SolrCore 'collection2_shard6_replica_n32': Unable to create core 
[collection2_shard6_replica_n32] Caused by: no segments* file found in 
LockValidatingDirectoryWrapper(MockDirectoryWrapper(RAMDirectory@63c826a3 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@17cdc1)): files: 
[]
   [junit4]   2>        at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
   [junit4]   2>        at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:91)
...
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Unable to 
create core [collection2_shard6_replica_n32]
   [junit4]   2>        at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1059)
   [junit4]   2>        at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:954)
   [junit4]   2>        ... 39 more
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening 
new searcher
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.<init>(SolrCore.java:1013)
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.<init>(SolrCore.java:868)
   [junit4]   2>        at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1043)
   [junit4]   2>        ... 40 more
   [junit4]   2> Caused by: org.apache.solr.common.SolrException: Error opening 
new searcher
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2100)
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2220)
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1096)
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.<init>(SolrCore.java:985)
   [junit4]   2>        ... 42 more
   [junit4]   2> Caused by: org.apache.lucene.index.IndexNotFoundException: no 
segments* file found in 
LockValidatingDirectoryWrapper(MockDirectoryWrapper(RAMDirectory@63c826a3 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@17cdc1)): files: 
[]
   [junit4]   2>        at 
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1072)
   [junit4]   2>        at 
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:119)
   [junit4]   2>        at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:94)
   [junit4]   2>        at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:257)
   [junit4]   2>        at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:131)
   [junit4]   2>        at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2061)
   [junit4]   2>        ... 45 more
{noformat}






> FullSolrCloudDistribCmdsTest failures due to SolrCore initializating 
> incorrectly thinking index directory already exists?
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11988
>                 URL: https://issues.apache.org/jira/browse/SOLR-11988
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: SOLR-11988_nocommit_logging.patch, log.txt
>
>
> There's been quite a few jenkins failures from FullSolrCloudDistribCmdsTest 
> that all seem to follow a similar pattern:
>  * Failure manifests as "Could not find collection:collection2"
>  * Failing seeds _frequently_ reproduce, but aren't guaranteed to
>  * Root cause can be traced back to the collection creation failing because 
> one of more replica cores failed due to the brand new (Solr)IndexWriter 
> expects to find an existing segments file
>  ** SolrCore should have already created an (empty) index in 
> {{SolrCore.initIndex(...)}}
>  ** The fact that the {{SolrIndexWrite}} throws this exception in it's 
> constructor suggests that the earlier call to {{SolrCore.initIndex(...)}} is 
> not functioning reliably
>  ** Based on some experimenting i've done, it seems like the underlying 
> problem is that in {{SolrCore.initIndex(...)}} the DirectoryFactory can "lie" 
> about wether a directory already exists.
> More details to follow in comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to