[
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361306#comment-17361306
]
Mark Robert Miller edited comment on SOLR-14788 at 6/11/21, 12:31 AM:
----------------------------------------------------------------------
Solr2
Alright, so my thoughts on Solr2 - a concept that is essentially my thoughts on
what Solr could and should do, disregarding the limitations of the very old
foundations most is built on today.
These are my thoughts after a lot of research and experimentation. Perhaps not
useful to many, as I’ve mentioned, I don’t see a lot of alignment with other
directions. Just my opinion, but mostly I see the a similar approach that has
carried the current direction which largely focuses on working around the
current foundation pieces. So for what it’s worth.
High level and bullet point style, what I have found that seems to work much
better and is often being harnessed in other high scale and efficient Java
systems:
* Much more use of off heap memory with efficient and few system copies of
data. Much less auto boxing and inefficient collection implementations and
sizing.
* More efficient and thought use of concurrent algorithms and data structures.
* Much more use of asynchronous calls and use of efficient async thread pools
like forkjoin.
* More efficient and modern use of publish/subscribe models, such as the Flow
apis and reactive style patterns.
* A focus on generating much less garbage via more reuse, more efficient
garbage genenerating implementations, more off-heap and byte buffer use.
* Much more use of NIO2 in general. Less inefficient streams layered on top of
efficient third party NIO2. More concurrent and asynchronous IO.
* Modern webapp features and focus and configuration. A modern Jetty focused
webapp build with the latest servlet spec and available features in mind will
blow the hell out of a lazy servlet generic 2x webapp model.
* A focus on efficient and configuration of third party lib choices. There are
gigantic differences depending on both.
* Full embrace of HTTP2 and everything that involves. You can’t just plug it in
and pray to get the advantages. You will often find disadvantages without a
deep inspection and exploration and understanding.
* Efficient and controlled communication. You don’t need to request data that
is needed perhaps every second or 5 100 times per second. You don’t want to do
almost anything faster than will actually be fast. I like queuing up needs,
allowing them to be combined and properly throttled. Publish subscribe can also
help.
* A focus on the amount of connections and threads that actually make sense. We
often learn little algorithms about the right numbers of threads for example
and out little effort into ever coming close to achieving them.
* Thread pool implementations and and separation that make sense. Most of the
built in thread pools in Java are very poor from behavior to queues to thread
management. Google and stack overflow worked out a lot of what to do instead
long ago. Don’t mix long blocking threads with pools that do quick and
efficient async or straight in memory tasks.
* A focus on painful and simple to address inefficiencies. StringBuffers with
default sizing that will surely grow. Inefficient string concatenation. Poor
collection sizing or auto boxing behavior.
* Manage life cycles correctly with better checks and proper order and
attention to outliers. Dividends payed everywhere.
* Make more parallel happen easy and often. Just like off heap, modern hardware
zooms ahead, even a decade ago you had to multiply instances per host to even
try take some advantage.
* And I have a lot more thoughts and results - IMO, with pay off that far
exceeds the model of workaround behavior and efficiency. But rather than dig
into them here, I believe I’ll get more return via other avenues down the road.
was (Author: markrmiller):
Solr2
Alright, so my thoughts on Solr2 - a concept my thoughts on what Solr what Solr
could and should do, disregarding the limitations of the very old foundations
most is built on today.
These are my thoughts after a lot of research and experimentation. Perhaps not
useful to many, as I’ve mentioned, I don’t see a lot of alignment with other
directions. Just my opinion, but mostly I see the a similar approach that has
carried the current direction which largely focuses on working around the
current foundation pieces. So for what it’s worth.
High level and bullet point style, what I have found that seems to work much
better and is often being harnessed in other high scale and efficient Java
systems:
* Much more use of off heap memory with efficient and few system copies of
data. Much less auto boxing and inefficient collection implementations and
sizing.
* More efficient and thought use of concurrent algorithms and data structures.
* Much more use of asynchronous calls and use of efficient async thread pools
like forkjoin.
* More efficient and modern use of publish/subscribe models, such as the Flow
apis and reactive style patterns.
* A focus on generating much less garbage via more reuse, more efficient
garbage genenerating implementations, more off-heap and byte buffer use.
* Much more use of NIO2 in general. Less inefficient streams layered on top of
efficient third party NIO2. More concurrent and asynchronous IO.
* Modern webapp features and focus and configuration. A modern Jetty focused
webapp build with the latest servlet spec and available features in mind will
blow the hell out of a lazy servlet generic 2x webapp model.
* A focus on efficient and configuration of third party lib choices. There are
gigantic differences depending on both.
* Full embrace of HTTP2 and everything that involves. You can’t just plug it in
and pray to get the advantages. You will often find disadvantages without a
deep inspection and exploration and understanding.
* Efficient and controlled communication. You don’t need to request data that
is needed perhaps every second or 5 100 times per second. You don’t want to do
almost anything faster than will actually be fast. I like queuing up needs,
allowing them to be combined and properly throttled. Publish subscribe can also
help.
* A focus on the amount of connections and threads that actually make sense. We
often learn little algorithms about the right numbers of threads for example
and out little effort into ever coming close to achieving them.
* Thread pool implementations and and separation that make sense. Most of the
built in thread pools in Java are very poor from behavior to queues to thread
management. Google and stack overflow worked out a lot of what to do instead
long ago. Don’t mix long blocking threads with pools that do quick and
efficient async or straight in memory tasks.
* A focus on painful and simple to address inefficiencies. StringBuffers with
default sizing that will surely grow. Inefficient string concatenation. Poor
collection sizing or auto boxing behavior.
* Manage life cycles correctly with better checks and proper order and
attention to outliers. Dividends payed everywhere.
* Make more parallel happen easy and often. Just like off heap, modern hardware
zooms ahead, even a decade ago you had to multiply instances per host to even
try take some advantage.
* And I have a lot more thoughts and results - IMO, with pay off that far
exceeds the model of workaround behavior and efficiency. But rather than dig
into them here, I believe I’ll get more return via other avenues down the road.
> Solr: The Next Big Thing
> ------------------------
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
> Issue Type: Task
> Reporter: Mark Robert Miller
> Assignee: Mark Robert Miller
> Priority: Critical
> Time Spent: 4h
> Remaining Estimate: 0h
>
> h3.
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
> Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color}
> duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and
> have some fun. Try to make some progress. Don't stress too much about the
> impact of your changes or maintaining stability and performance and
> correctness so much. Until the end of phase 1, I've got your back. I have a
> variety of tools and contraptions I have been building over the years and I
> will continue training them on this branch. I will review your changes and
> peer out across the land and course correct where needed. As Mike D will be
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some
> extent. Which is why once stage one is completed, I will flip The Policeman
> to off duty. When off duty, I'm always* *occasionally*{color} *down for some
> vigilante justice, but I won't be walking the beat, all that stuff about sit
> back and relax goes out the window.*_
> {quote}
>
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and
> Noble have already contributed support and testing and have plans for
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]