[
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028712#comment-14028712
]
David Smiley commented on SOLR-3585:
------------------------------------
bq. I saw quite often that a producer is a slowpoke (single thread SELECT), and
fast consumer doesn't help at all. That's why I prefer to burden client for
threads. Look at SolrCloud if client doesn't care about performance, it can
send update to any node, but if needs performance, it have to concern about
cluster topology, and send to nodes concurrently;
Sure; the bottleneck is _sometimes_ the source, in which case it will benefit
from multiple threads to get the data to increase performance. But that is
independent of the number of ideal threads to populate the target (Solr in this
case). It's quite possible that a given application might find that to
transfer the data with maximum throughput, it needs to get the data from the
source in X threads and populate Solr in Y threads, where X and Y are not equal
and not 1. I'm just saying the half of this that populates Solr should have the
pool on the Solr side. I'm saying nothing of the data-source consumption end.
And your comments about JEE (i.e. servlet-container) concurrent request limits
are not applicable because it can't be targeted to just updates, which is where
the real constraint is. Not to mention the JEE container option is going away
for v5 any way.
bq. I can agree that it might make sense to introduce separate thread pool for
handling updates, but only for limiting number of threads to avoid unnecessary
index segmentation due to concurrent flush
_Another_ reason for the Solr-side pool. A customer of mine sorta hit this
because they failed to understand that Solr couldn't handle 100 pipes shoving
data into it. Solr shouldn't fall over in such a case; it should use as many
threads as it's configured to use in its config file. I propose it default to
a mode in which consumer-thread == indexing thread (as is now) but limited to
2... and have other options of course such as a dedicated thread-pool.
bq. One note, now the code here does the same as ConcurrentUpdateSolrServer,
it's not good at all. I'd prefer to extract core thread code from CUSS, and
reuse it in this update processor with in-process sink for sure.
Makes sense.
> processing updates in multiple threads
> --------------------------------------
>
> Key: SOLR-3585
> URL: https://issues.apache.org/jira/browse/SOLR-3585
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 4.0-ALPHA, 5.0
> Reporter: Mikhail Khludnev
> Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch,
> report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which
> concurrently process the stream of commands. It may be beneficial for users
> who streams many docs through single request.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]