This is a classic example of why *every change to default code paths* for
core components must accompany performance benchmarks.

On Tue, 20 Oct, 2020, 1:35 pm Thomas Wöckinger, <[email protected]>
wrote:

> Did you have time to look at this?
>
> On Tue, Oct 13, 2020 at 2:43 PM David Smiley (Jira) <[email protected]>
> wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213093#comment-17213093
>> ]
>>
>> David Smiley commented on SOLR-14923:
>> -------------------------------------
>>
>> I am responsible for this bug, along with [~moshebla], the contributor of
>> SOLR-12638.  Perhaps the single most bit of code I've regretted committing
>> on behalf of another are the few lines of code you have found Thomas.  I
>> expressed my reservations at the time:
>>
>>
>> https://issues.apache.org/jira/browse/SOLR-12638?focusedCommentId=16872898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16872898
>>
>> bq. What gnaws at me is that this "UpdateLog.openRealtimeSearcher" is
>> being called optimistically on a new doc because maaaayyyybeee some future
>> atomic update will need to see it. And not just any type of atomic update;
>> one that is directly to a nested child doc (something I consider highly
>> experimental). It's as if we're optimizing for making that future atomic
>> update faster by doing work in advance that will, I think, very rarely
>> actually be used. It's a tragedy, if I'm understanding this right.
>>
>> There's a bit of conversation before in the issue about it as well.  It's
>> difficult for me to say at the moment what the fix is because that's fairly
>> complex low-level Solr code that I think few people understand well.
>> Nonetheless I'll look into it further this week.
>>
>>
>> > Indexing performance is unacceptable when child documents are involved
>> > ----------------------------------------------------------------------
>> >
>> >                 Key: SOLR-14923
>> >                 URL: https://issues.apache.org/jira/browse/SOLR-14923
>> >             Project: Solr
>> >          Issue Type: Bug
>> >      Security Level: Public(Default Security Level. Issues are Public)
>> >          Components: update, UpdateRequestProcessors
>> >    Affects Versions: master (9.0), 8.3, 8.4, 8.5, 8.6
>> >            Reporter: Thomas Wöckinger
>> >            Priority: Critical
>> >              Labels: performance
>> >
>> > Parallel indexing does not make sense at moment when child documents
>> are used.
>> > The org.apache.solr.update.processor.DistributedUpdateProcessor checks
>> at the end of the method doVersionAdd if Ulog caches should be refreshed.
>> > This check will return true if any child document is included in the
>> AddUpdateCommand.
>> > If so ulog.openRealtimeSearcher(); is called, this call is very
>> expensive, and executed in a synchronized block of the UpdateLog instance,
>> therefore all other operations on the UpdateLog are blocked too.
>> > Because every important UpdateLog method (add, delete, ...) is done
>> using a synchronized block almost each operation is blocked.
>> > This reduces multi threaded index update to a single thread behavior.
>> > The described behavior is not depending on any option of the
>> UpdateRequest, so it does not make any difference if 'waitFlush',
>> 'waitSearcher' or 'softCommit'  is true or false.
>> > The described behavior makes the usage of ChildDocuments useless,
>> because the performance is unacceptable.
>> >
>> >
>>
>>
>>
>> --
>> This message was sent by Atlassian Jira
>> (v8.3.4#803005)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to