[jira] [Commented] (SOLR-3585) processing updates in multiple threads

David Smiley (JIRA) Wed, 11 Jun 2014 18:46:18 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028712#comment-14028712
 ]


David Smiley commented on SOLR-3585:
------------------------------------

bq. I saw quite often that a producer is a slowpoke (single thread SELECT), and 
fast consumer doesn't help at all. That's why I prefer to burden client for 
threads. Look at SolrCloud if client doesn't care about performance, it can 
send update to any node, but if needs performance, it have to concern about 
cluster topology, and send to nodes concurrently;

Sure; the bottleneck is _sometimes_ the source, in which case it will benefit 
from multiple threads to get the data to increase performance.  But that is 
independent of the number of ideal threads to populate the target (Solr in this 
case).  It's quite possible that a given application might find that to 
transfer the data with maximum throughput, it needs to get the data from the 
source in X threads and populate Solr in Y threads, where X and Y are not equal 
and not 1. I'm just saying the half of this that populates Solr should have the 
pool on the Solr side.  I'm saying nothing of the data-source consumption end.

And your comments about JEE (i.e. servlet-container) concurrent request limits 
are not applicable because it can't be targeted to just updates, which is where 
the real constraint is.  Not to mention the JEE container option is going away 
for v5 any way.

bq. I can agree that it might make sense to introduce separate thread pool for 
handling updates, but only for limiting number of threads to avoid unnecessary 
index segmentation due to concurrent flush

_Another_ reason for the Solr-side pool.  A customer of mine sorta hit this 
because they failed to understand that Solr couldn't handle 100 pipes shoving 
data into it.  Solr shouldn't fall over in such a case; it should use as many 
threads as it's configured to use in its config file.  I propose it default to 
a mode in which consumer-thread == indexing thread (as is now) but limited to 
2... and have other options of course such as a dedicated thread-pool.

bq. One note, now the code here does the same as ConcurrentUpdateSolrServer, 
it's not good at all. I'd prefer to extract core thread code from CUSS, and 
reuse it in this update processor with in-process sink for sure.

Makes sense.

> processing updates in multiple threads
> --------------------------------------
>
>                 Key: SOLR-3585
>                 URL: https://issues.apache.org/jira/browse/SOLR-3585
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 4.0-ALPHA, 5.0
>            Reporter: Mikhail Khludnev
>         Attachments: SOLR-3585.patch, SOLR-3585.patch, multithreadupd.patch, 
> report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which 
> concurrently process the stream of commands. It may be beneficial for users 
> who streams many docs through single request. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3585) processing updates in multiple threads

Reply via email to