[ 
https://issues.apache.org/jira/browse/SOLR-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039047#comment-16039047
 ] 

Shawn Heisey commented on SOLR-10806:
-------------------------------------

This seems to be a very low level Lucene problem, possibly caused by building 
an index with one schema, then trying to change the schema and use it with an 
existing index.  You're indicating that this is happening during collection 
creation ... so I'm wondering if perhaps you have some existing core/index 
directories left over from a previous version of the collection, and Solr is 
trying to recreate a core with a directory that already exists and contains an 
index, and finding that the existing index isn't compatible with the new schema.

Generally speaking, most schema changes require a reindex, and sometimes the 
entire index must be completely wiped out before starting the reindex, because 
of problems like this.

https://wiki.apache.org/solr/HowToReindex

Low-level Lucene problems are very difficult for Solr to handle cleanly.  
You're right that this shouldn't cause everything to grind to a halt, but it 
may be challenging to achieve a reasonable outcome when there is a very 
low-level Lucene problem.  We should try, I'm just warning you in advance that 
it might not be easy.


> Solr Replica goes down with NumberFormatException: Invalid shift value (64) 
> in prefixCoded bytes (is encoded value really an INT?)
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10806
>                 URL: https://issues.apache.org/jira/browse/SOLR-10806
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.3
>            Reporter: Sachin Goyal
>
> Our Solr nodes go down within 20-30 minutes of indexing.
> It does not seem that load-rate is too high because the exception in the logs 
> is pointing to a data problem:
> {color:darkred}
> INFO  - 2017-06-02 23:21:19.094; org.apache.solr.core.SolrCore; 
> \[node-instances_shard2_replica3\] Registered new searcher 
> Searcher@6740879c\[node-instances_shard2_replica3\] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_ne(6.3.0):C200591/8616:delGen=20)
>  Uninverting(_wx(6.3.0):C72132/697:delGen=5) 
> Uninverting(_y0(6.3.0):c5798/27:delGen=3) 
> Uninverting(_yv(6.3.0):c10935/827:delGen=2) 
> Uninverting(_z4(6.3.0):C4163/2277:delGen=1)))}
> ERROR - 2017-06-02 23:21:19.105; org.apache.solr.core.CoreContainer; Error 
> waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:526)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$38/199449817.run(Unknown Source)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1611272577.run(Unknown
>  Source)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:855)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:498)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$37/1402433372.call(Unknown Source)
>         ... 6 more
> Caused by: java.lang.NumberFormatException: Invalid shift value (64) in 
> prefixCoded bytes (is encoded value really an INT?)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getPrefixCodedLongShift(LegacyNumericUtils.java:163)
>         at 
> org.apache.lucene.util.LegacyNumericUtils$1.accept(LegacyNumericUtils.java:392)
>         at 
> org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232)
>         at org.apache.lucene.index.Terms.getMax(Terms.java:169)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getMaxLong(LegacyNumericUtils.java:504)
>         at 
> org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:233)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1584)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
>         at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:949)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:931)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:776)
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)
>         ... 8 more
> {color}
> It does not seem right that Solr Node itself should go down for such a 
> problem.
> # Error waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core
> # Unable to create core
> # NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is 
> encoded value really an INT?)
> i.e. Core creation fails because there was some confusion between long and 
> integer.
> If there is a data issue then somehow it should communicate it with an 
> exception during ingestion.
> \\
> \\
> *UPDATE*:
> Another issue I see with the above problem is that solr cluster is completely 
> inaccessible.
> Solr-UI is also not coming up. I restarted the Solr servers and they refuse 
> to recover.
> I am not even able to delete the collections and create them afresh.
> It seems the only way out is to do an *rm -rf* and re-install
> Note that it is not related to network as I can ssh to the Solr machines and 
> send messages to other Solr machines using nc
> \\
> \\
> *UPDATE 2*:
> I had a 24 node cluster with 2 collections.
> Each collection used  6 nodes and had 2 shard, 3 replica configuration.
> So 12 nodes used out of 24 nodes.
> Rest 12 nodes had Solr running with same zookeeper but no collections/cores.
> After the above errors begin to happen, Solr-UI of all 24 nodes became 
> unresponsive!
> So I tried the delete-collection API from the command line - no response.
> Ultimately I ran the delete-collection from the command line in a loop and it 
> deleted a part of the collection.
> Then I had to manually delete the *<coreName>/data/index/write.lock* file on 
> some nodes to purge those bad collections.
> Its been a few hours since then. There are no collections and still few nodes 
> are unresponsive with following messages in the logs:
> {color:brown}
> INFO  - 2017-06-03 06:40:51.308; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.408; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.508; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.608; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> {color}
> It looks like a serious stability problem to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to