[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361306#comment-17361306
 ] 

Mark Robert Miller edited comment on SOLR-14788 at 6/11/21, 12:31 AM:
----------------------------------------------------------------------

Solr2

Alright, so my thoughts on Solr2 - a concept that is essentially my thoughts on 
what Solr could and should do, disregarding the limitations of the very old 
foundations most is built on today. 

These are my thoughts after a lot of research and experimentation. Perhaps not 
useful to many, as I’ve mentioned, I don’t see a lot of alignment with other 
directions. Just my opinion, but mostly I see the a similar approach that has 
carried the current direction which largely focuses on working around the 
current foundation pieces. So for what it’s worth.

High level and bullet point style, what I have found that seems to work much 
better and is often being harnessed in other high scale and efficient Java 
systems:

* Much more use of off heap memory with efficient and few system copies of 
data. Much less auto boxing and inefficient collection implementations and 
sizing. 
* More efficient and thought use of concurrent algorithms and data structures.
* Much more use of asynchronous calls and use of efficient async thread pools 
like forkjoin.
* More efficient and modern use of publish/subscribe models, such as the Flow 
apis and reactive style patterns.
* A focus on generating much less garbage via more reuse, more efficient 
garbage genenerating implementations, more off-heap and byte buffer use. 
* Much more use of NIO2 in general. Less inefficient streams layered on top of 
efficient third party NIO2. More concurrent and asynchronous IO.
* Modern webapp features and focus and configuration. A modern Jetty focused 
webapp build with the latest servlet spec and available features in mind will 
blow the hell out of a lazy servlet generic 2x webapp model. 
* A focus on efficient and configuration of third party lib choices. There are 
gigantic differences depending on both.
* Full embrace of HTTP2 and everything that involves. You can’t just plug it in 
and pray to get the advantages. You will often find disadvantages without a 
deep inspection and exploration and understanding.
* Efficient and controlled communication. You don’t need to request data that 
is needed perhaps every second or 5 100 times per second. You don’t want to do 
almost anything faster than will actually be fast. I like queuing up needs, 
allowing them to be combined and properly throttled. Publish subscribe can also 
help.
* A focus on the amount of connections and threads that actually make sense. We 
often learn little algorithms about the right numbers of threads for example 
and out little effort into ever coming close to achieving them.
* Thread pool implementations and and separation that make sense. Most of the 
built in thread pools in Java are very poor from behavior to queues to thread 
management. Google and stack overflow worked out a lot of what to do instead 
long ago. Don’t mix long blocking threads with pools that do quick and 
efficient async or straight in memory tasks. 
* A focus on painful and simple to address inefficiencies. StringBuffers with 
default sizing that will surely grow. Inefficient string concatenation. Poor 
collection sizing or auto boxing behavior.
* Manage life cycles correctly with better checks and proper order and 
attention to outliers. Dividends payed everywhere.
* Make more parallel happen easy and often. Just like off heap, modern hardware 
zooms ahead, even a decade ago you had to multiply instances per host to even 
try take some advantage.
* And I have a lot more thoughts and results - IMO, with pay off that far 
exceeds the model of workaround behavior and efficiency. But rather than dig 
into them here, I believe I’ll get more return via other avenues down the road.



was (Author: markrmiller):
Solr2

Alright, so my thoughts on Solr2 - a concept my thoughts on what Solr what Solr 
could and should do, disregarding the limitations of the very old foundations 
most is built on today. 

These are my thoughts after a lot of research and experimentation. Perhaps not 
useful to many, as I’ve mentioned, I don’t see a lot of alignment with other 
directions. Just my opinion, but mostly I see the a similar approach that has 
carried the current direction which largely focuses on working around the 
current foundation pieces. So for what it’s worth.

High level and bullet point style, what I have found that seems to work much 
better and is often being harnessed in other high scale and efficient Java 
systems:

* Much more use of off heap memory with efficient and few system copies of 
data. Much less auto boxing and inefficient collection implementations and 
sizing. 
* More efficient and thought use of concurrent algorithms and data structures.
* Much more use of asynchronous calls and use of efficient async thread pools 
like forkjoin.
* More efficient and modern use of publish/subscribe models, such as the Flow 
apis and reactive style patterns.
* A focus on generating much less garbage via more reuse, more efficient 
garbage genenerating implementations, more off-heap and byte buffer use. 
* Much more use of NIO2 in general. Less inefficient streams layered on top of 
efficient third party NIO2. More concurrent and asynchronous IO.
* Modern webapp features and focus and configuration. A modern Jetty focused 
webapp build with the latest servlet spec and available features in mind will 
blow the hell out of a lazy servlet generic 2x webapp model. 
* A focus on efficient and configuration of third party lib choices. There are 
gigantic differences depending on both.
* Full embrace of HTTP2 and everything that involves. You can’t just plug it in 
and pray to get the advantages. You will often find disadvantages without a 
deep inspection and exploration and understanding.
* Efficient and controlled communication. You don’t need to request data that 
is needed perhaps every second or 5 100 times per second. You don’t want to do 
almost anything faster than will actually be fast. I like queuing up needs, 
allowing them to be combined and properly throttled. Publish subscribe can also 
help.
* A focus on the amount of connections and threads that actually make sense. We 
often learn little algorithms about the right numbers of threads for example 
and out little effort into ever coming close to achieving them.
* Thread pool implementations and and separation that make sense. Most of the 
built in thread pools in Java are very poor from behavior to queues to thread 
management. Google and stack overflow worked out a lot of what to do instead 
long ago. Don’t mix long blocking threads with pools that do quick and 
efficient async or straight in memory tasks. 
* A focus on painful and simple to address inefficiencies. StringBuffers with 
default sizing that will surely grow. Inefficient string concatenation. Poor 
collection sizing or auto boxing behavior.
* Manage life cycles correctly with better checks and proper order and 
attention to outliers. Dividends payed everywhere.
* Make more parallel happen easy and often. Just like off heap, modern hardware 
zooms ahead, even a decade ago you had to multiply instances per host to even 
try take some advantage.
* And I have a lot more thoughts and results - IMO, with pay off that far 
exceeds the model of workaround behavior and efficiency. But rather than dig 
into them here, I believe I’ll get more return via other avenues down the road.


> Solr: The Next Big Thing
> ------------------------
>
>                 Key: SOLR-14788
>                 URL: https://issues.apache.org/jira/browse/SOLR-14788
>             Project: Solr
>          Issue Type: Task
>            Reporter: Mark Robert Miller
>            Assignee: Mark Robert Miller
>            Priority: Critical
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} 
> duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* *occasionally*{color} *down for some 
> vigilante justice, but I won't be walking the beat, all that stuff about sit 
> back and relax goes out the window.*_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to