[
https://issues.apache.org/jira/browse/SOLR-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039054#comment-16039054
]
Erick Erickson commented on SOLR-10806:
---------------------------------------
[~jpountz][~thetaphi][~mikemccand] Any insights here as the error is coming
from Lucene?
> Solr Replica goes down with NumberFormatException: Invalid shift value (64)
> in prefixCoded bytes (is encoded value really an INT?)
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-10806
> URL: https://issues.apache.org/jira/browse/SOLR-10806
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 6.3
> Reporter: Sachin Goyal
>
> Our Solr nodes go down within 20-30 minutes of indexing.
> It does not seem that load-rate is too high because the exception in the logs
> is pointing to a data problem:
> {color:darkred}
> INFO - 2017-06-02 23:21:19.094; org.apache.solr.core.SolrCore;
> \[node-instances_shard2_replica3\] Registered new searcher
> Searcher@6740879c\[node-instances_shard2_replica3\]
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_ne(6.3.0):C200591/8616:delGen=20)
> Uninverting(_wx(6.3.0):C72132/697:delGen=5)
> Uninverting(_y0(6.3.0):c5798/27:delGen=3)
> Uninverting(_yv(6.3.0):c10935/827:delGen=2)
> Uninverting(_z4(6.3.0):C4163/2277:delGen=1)))}
> ERROR - 2017-06-02 23:21:19.105; org.apache.solr.core.CoreContainer; Error
> waiting for SolrCore to be created
> java.util.concurrent.ExecutionException:
> org.apache.solr.common.SolrException: Unable to create core
> \[node-instances_shard2_replica3\]
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:526)
> at
> org.apache.solr.core.CoreContainer$$Lambda$38/199449817.run(Unknown Source)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1611272577.run(Unknown
> Source)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core
> \[node-instances_shard2_replica3\]
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:855)
> at
> org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:498)
> at
> org.apache.solr.core.CoreContainer$$Lambda$37/1402433372.call(Unknown Source)
> ... 6 more
> Caused by: java.lang.NumberFormatException: Invalid shift value (64) in
> prefixCoded bytes (is encoded value really an INT?)
> at
> org.apache.lucene.util.LegacyNumericUtils.getPrefixCodedLongShift(LegacyNumericUtils.java:163)
> at
> org.apache.lucene.util.LegacyNumericUtils$1.accept(LegacyNumericUtils.java:392)
> at
> org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232)
> at org.apache.lucene.index.Terms.getMax(Terms.java:169)
> at
> org.apache.lucene.util.LegacyNumericUtils.getMaxLong(LegacyNumericUtils.java:504)
> at
> org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:233)
> at
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1584)
> at
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
> at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:949)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:931)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:776)
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)
> ... 8 more
> {color}
> It does not seem right that Solr Node itself should go down for such a
> problem.
> # Error waiting for SolrCore to be created
> java.util.concurrent.ExecutionException:
> org.apache.solr.common.SolrException: Unable to create core
> # Unable to create core
> # NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is
> encoded value really an INT?)
> i.e. Core creation fails because there was some confusion between long and
> integer.
> If there is a data issue then somehow it should communicate it with an
> exception during ingestion.
> \\
> \\
> *UPDATE*:
> Another issue I see with the above problem is that solr cluster is completely
> inaccessible.
> Solr-UI is also not coming up. I restarted the Solr servers and they refuse
> to recover.
> I am not even able to delete the collections and create them afresh.
> It seems the only way out is to do an *rm -rf* and re-install
> Note that it is not related to network as I can ssh to the Solr machines and
> send messages to other Solr machines using nc
> \\
> \\
> *UPDATE 2*:
> I had a 24 node cluster with 2 collections.
> Each collection used 6 nodes and had 2 shard, 3 replica configuration.
> So 12 nodes used out of 24 nodes.
> Rest 12 nodes had Solr running with same zookeeper but no collections/cores.
> After the above errors begin to happen, Solr-UI of all 24 nodes became
> unresponsive!
> So I tried the delete-collection API from the command line - no response.
> Ultimately I ran the delete-collection from the command line in a loop and it
> deleted a part of the collection.
> Then I had to manually delete the *<coreName>/data/index/write.lock* file on
> some nodes to purge those bad collections.
> Its been a few hours since then. There are no collections and still few nodes
> are unresponsive with following messages in the logs:
> {color:brown}
> INFO - 2017-06-03 06:40:51.308; org.apache.solr.core.SolrCore; Core
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking
> again.
> INFO - 2017-06-03 06:40:51.408; org.apache.solr.core.SolrCore; Core
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking
> again.
> INFO - 2017-06-03 06:40:51.508; org.apache.solr.core.SolrCore; Core
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking
> again.
> INFO - 2017-06-03 06:40:51.608; org.apache.solr.core.SolrCore; Core
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking
> again.
> {color}
> It looks like a serious stability problem to me.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]