[ 
https://issues.apache.org/jira/browse/SOLR-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959219#comment-16959219
 ] 

Noble Paul commented on SOLR-13867:
-----------------------------------

 I agree with your observations that Solr has become a kitchen sink of a 
million things. There is feature creep and there are a lot of badly written 
code. My suggestion to fix Solr is to make it modular.

 
 * Make a _Core Solr_ with only the essential features.  Index,Search, cloud 
related stuff, core set of APIs and easy way to load /unload modules. All tests 
in _Core Solr_ must ALWAYS pass.
 * Everything else (DIH,Autoscaling, Streaming, All URPs, analyzers/tokenizers, 
HDFS, Security plugins , CDCR etc etc) moves out to optional modules which can 
be installed only if and when necessary. The users should be able to pick & 
choose what they want.
 * Carefully make the APIs in the _Core Solr_ better so that the modules can 
purely depend on them. APIs should maintain backcompat (if possible across 
major versions) and should be properly documented
 * Any additions to the _Core Solr_ should go with proper vetting. Possibly 
after a vote in the community. Modules can have a more flexible policy. 

> Make Solrcloud stable and performant and capable of having passing tests.
> -------------------------------------------------------------------------
>
>                 Key: SOLR-13867
>                 URL: https://issues.apache.org/jira/browse/SOLR-13867
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Major
>             Fix For: master (9.0)
>
>
> After spending a bit of time away from SolrCloud after being deeply involved 
> in trying to stabilize it and it's tests, I came back in 2018 and went deep 
> into the system with the Starburst upgrade.
> What I found surprised me, though I guess it should not have. The system is 
> slow, often silly, super buggy, not good at connection reuse or thread safety 
> or efficient Zookeeper communication or efficient startup and shutdown.
> Often, the things we do to make tests pass make things worse because you 
> can't do things reasonably without some major code work and so we fight for 
> tests passes, not correctness.
> Twice now, I've seen the system in the shape it was supposed to take. FAST. 
> Not bug free, but 100X more solid at least and much, much, much, much faster.
> The current system is sick and actually getting worse under it's weight as 
> more is shoveled on top. Even since 1.5 years ago, the problems are worse, 
> not better. Tests will never pass. Yes, our tests where in pretty bad shape. 
> But you can put them in the best shape possible and it won't matter. The 
> system will still fail tests.
> Sadly, I'm smart enough to know what has to be done, but not smart enough to 
> keep my work around after addressing most of the problems twice.
> Non the less, it's time to fix SolrCloud. It's not supposed to be this way. 
> I've twice spent a week or two in a state with super fast SolrCloud. Super 
> fast build system. Developmenet is actually fun. You actually have a chance. 
> I'm talking tests you have never seen take under 45-60 seconds taking 5.  
> Consistently. A different world.
> I spent a lot of time after starburst making tests pass for me. Then a lot of 
> time on a better build system that can help us improve development and good 
> practices around the project. And then a lot of time making tests faster. 
> These are important steps, but little itty bitty baby steps without 
> addressing the core rot that is growing. We don't find a problem and fully 
> understand what is up and craft a careful solution. We find something that we 
> can toss into the grand canyon, listen to it bounce around for a while, and 
> if nobody screams, we move on to the next thing. That's not necessarily 
> anyone's choice, there is little else you can do until the system is fixed. 
> When that happens we can start making smart changes instead of just shoving 
> around the mess.
> Twice I have made the current system fast. What happens first? Nothing works. 
> The system doesn't know how to be fast. It doesn't have the thread safety or 
> proper logic to be fast. And that is not a place I want to be.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to