[ 
https://issues.apache.org/jira/browse/SOLR-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039117#comment-16039117
 ] 

Uwe Schindler commented on SOLR-10806:
--------------------------------------

This happens if you had a TrieLong/TrieDouble in your index schema but later 
decided to change the schema to use an TrieInt/TrieFloat. The issue could be:
- Index was created with Long/Double
- User decided to use another encoding and deleted all documents from index and 
rebuilt index without physically deleting it
- The index still contains old Long/Doubles, just marked as deleted
- When IndexSearcher is opened it fails at some point (at first range query, or 
if no docvalues when uninverting)

There is nothing that Solr can improve if one breaks index metadata. There is 
no way to get the core up and running. As this error is low-level, there is no 
way to wrap it in a more useful error message.

> Solr Replica goes down with NumberFormatException: Invalid shift value (64) 
> in prefixCoded bytes (is encoded value really an INT?)
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10806
>                 URL: https://issues.apache.org/jira/browse/SOLR-10806
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.3
>            Reporter: Sachin Goyal
>
> Our Solr nodes go down within 20-30 minutes of indexing.
> It does not seem that load-rate is too high because the exception in the logs 
> is pointing to a data problem:
> {color:darkred}
> INFO  - 2017-06-02 23:21:19.094; org.apache.solr.core.SolrCore; 
> \[node-instances_shard2_replica3\] Registered new searcher 
> Searcher@6740879c\[node-instances_shard2_replica3\] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_ne(6.3.0):C200591/8616:delGen=20)
>  Uninverting(_wx(6.3.0):C72132/697:delGen=5) 
> Uninverting(_y0(6.3.0):c5798/27:delGen=3) 
> Uninverting(_yv(6.3.0):c10935/827:delGen=2) 
> Uninverting(_z4(6.3.0):C4163/2277:delGen=1)))}
> ERROR - 2017-06-02 23:21:19.105; org.apache.solr.core.CoreContainer; Error 
> waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:526)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$38/199449817.run(Unknown Source)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1611272577.run(Unknown
>  Source)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:855)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:498)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$37/1402433372.call(Unknown Source)
>         ... 6 more
> Caused by: java.lang.NumberFormatException: Invalid shift value (64) in 
> prefixCoded bytes (is encoded value really an INT?)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getPrefixCodedLongShift(LegacyNumericUtils.java:163)
>         at 
> org.apache.lucene.util.LegacyNumericUtils$1.accept(LegacyNumericUtils.java:392)
>         at 
> org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232)
>         at org.apache.lucene.index.Terms.getMax(Terms.java:169)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getMaxLong(LegacyNumericUtils.java:504)
>         at 
> org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:233)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1584)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
>         at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:949)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:931)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:776)
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)
>         ... 8 more
> {color}
> It does not seem right that Solr Node itself should go down for such a 
> problem.
> # Error waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core
> # Unable to create core
> # NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is 
> encoded value really an INT?)
> i.e. Core creation fails because there was some confusion between long and 
> integer.
> If there is a data issue then somehow it should communicate it with an 
> exception during ingestion.
> \\
> \\
> *UPDATE*:
> Another issue I see with the above problem is that solr cluster is completely 
> inaccessible.
> Solr-UI is also not coming up. I restarted the Solr servers and they refuse 
> to recover.
> I am not even able to delete the collections and create them afresh.
> It seems the only way out is to do an *rm -rf* and re-install
> Note that it is not related to network as I can ssh to the Solr machines and 
> send messages to other Solr machines using nc
> \\
> \\
> *UPDATE 2*:
> I had a 24 node cluster with 2 collections.
> Each collection used  6 nodes and had 2 shard, 3 replica configuration.
> So 12 nodes used out of 24 nodes.
> Rest 12 nodes had Solr running with same zookeeper but no collections/cores.
> After the above errors begin to happen, Solr-UI of all 24 nodes became 
> unresponsive!
> So I tried the delete-collection API from the command line - no response.
> Ultimately I ran the delete-collection from the command line in a loop and it 
> deleted a part of the collection.
> Then I had to manually delete the *<coreName>/data/index/write.lock* file on 
> some nodes to purge those bad collections.
> Its been a few hours since then. There are no collections and still few nodes 
> are unresponsive with following messages in the logs:
> {color:brown}
> INFO  - 2017-06-03 06:40:51.308; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.408; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.508; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.608; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> {color}
> It looks like a serious stability problem to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to