https://issues.apache.org/jira/browse/SOLR-14245
There was a *production outage* at *odd hours* at my (and Noble's) client, due to this above change in Solr 8.5 onwards by *Andrzej Bialecki*. In short, there is some bug in Solr where a replica gets "null" as the node_name (upon invocation of a collection API command). On the rare occasions where we encountered such situations in the past, the replica would be unavailable and the system would work fine overall. However, this change (which introduces strict validation of errors while *reading* Replica objects) now means that if such a situation arises (where some Solr's APIs itself results in node_name being null in a state.json), all SolrJ clients and all Solr nodes will go for a toss (possibly crash, and not start back up). This change was rushed in, *without any discussions or review*, without extensive testing for the failures it will cause on existing systems where cluster state is messed up but system is running, and *without any consideration for the impact on users*. Noble and I are of the opinion that this change should be *reverted immediately*, considering the impact to users. However, there is *strong disagreement on Andrzej's part*. *Mistakes* happen, but *doubling down on them irrationally* [1] will destroy the reputation of the project, let alone the peace of mind of those who are running Solr in production. Does someone have any thoughts or opinions? [1] - https://issues.apache.org/jira/browse/SOLR-14245?focusedCommentId=17346758&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17346758
