[
https://issues.apache.org/jira/browse/SOLR-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033245#comment-14033245
]
David Smiley commented on SOLR-3585:
------------------------------------
I reviewed your patch. Overall I really like it!
* Please don’t include in the patch changes to files that appear to do have no
effect like java import modifications as seen in TestSolrJ. I suspect you did
some refactoring and reverted and some harmless deltas were left.
* CUSS: have only one constructor that does the constructing; the other
constructors can call the one.
* I think instead of having a Callback interface and createCallback() method,
the user of ThreadBurster should simply subclass ThreadBurster and implement
the handle(), finish() and handleError() methods. It's already abstract
_anyway_. After doing so, you could omit passing any arguments to handle()
except for 'E'. Then consider making more fields on ThreadBurster private,
adding some convenience methods for adding & polling the queue (maybe).
* This wasn't introduced by you but runnerLock is weird: it's a Lock but not
used for locking, it's used in blockUnitlFinished() (which is a typo!) as a way
to wait until there are no runners. Furthermore, blockUnitlFinished appears to
wait longer than ideal since "pollQueueTime" time must pass. At a minimum, it
would be more clear if it were using CountDownLatch. Further improvements
would be nice in a separate issue, to include javadocs on blockUntilFinished.
* ThreadedUpdateProcesor.processRollback() shouldn’t call checkTrouble()
* I wonder if there’s a nicer way to hook this in without a new “backing.chain”
param? Like if you could simply list this in a series of other URPs. Not a
serious problem but it's a little awkward.
* I like your tests, particularly the use of DelayProcessorFactory.
* I think preClose is the wrong place to shut down the Executor; look at its
docs. If you do it in postClose; it should be safe.
* I don’t care for some of the variable names. “pipes” should be something
like “numPipes” or “pipeCount” etc. When I see a variable named as you have, I
think it actually *is* the set of pipes, versus the number of them. Likewise
“buffer” should be “bufferSize” or something. Note that CUSS uses
“threadCount” and “queueSize” which are more clear to at least me. I also saw
“prm” vs. “params” though sometimes I prefer names like “defaultParams” etc. if
there is ambiguity as to which params.
* It’s a shame that commit, delete, etc. (except add) have to do a
blockUntilFinished which effects *other* concurrent streams the client
(potentially multiple asynchronous update clients) may be sending. It’s not a
big deal but not ideal. The separate buffers (per concurrent load) minimize
that, at least. But people need to be aware there is a
buffer-per-concurrent-connection. At this point (progress not perfection) it’s
probably not worth concerning ourselves with optimizing this to a single buffer
and a smarter blockUntilFinished.
* It’s worth documenting that DistributedUpdateProcessorFactory ought to be
*before* this URP.
* Masking all exceptions as IOException isn’t right. SolrException should pass
through. FYI Guava’s Throwables may be handy here.
> processing updates in multiple threads
> --------------------------------------
>
> Key: SOLR-3585
> URL: https://issues.apache.org/jira/browse/SOLR-3585
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 4.0-ALPHA, 5.0
> Reporter: Mikhail Khludnev
> Attachments: SOLR-3585.patch, SOLR-3585.patch, SOLR-3585.patch,
> multithreadupd.patch, report.tar.gz
>
>
> Hello,
> I'd like to contribute update processor which forks many threads which
> concurrently process the stream of commands. It may be beneficial for users
> who streams many docs through single request.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]