This is a classic example of why *every change to default code paths* for core components must accompany performance benchmarks.
On Tue, 20 Oct, 2020, 1:35 pm Thomas Wöckinger, <[email protected]> wrote: > Did you have time to look at this? > > On Tue, Oct 13, 2020 at 2:43 PM David Smiley (Jira) <[email protected]> > wrote: > >> >> [ >> https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213093#comment-17213093 >> ] >> >> David Smiley commented on SOLR-14923: >> ------------------------------------- >> >> I am responsible for this bug, along with [~moshebla], the contributor of >> SOLR-12638. Perhaps the single most bit of code I've regretted committing >> on behalf of another are the few lines of code you have found Thomas. I >> expressed my reservations at the time: >> >> >> https://issues.apache.org/jira/browse/SOLR-12638?focusedCommentId=16872898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16872898 >> >> bq. What gnaws at me is that this "UpdateLog.openRealtimeSearcher" is >> being called optimistically on a new doc because maaaayyyybeee some future >> atomic update will need to see it. And not just any type of atomic update; >> one that is directly to a nested child doc (something I consider highly >> experimental). It's as if we're optimizing for making that future atomic >> update faster by doing work in advance that will, I think, very rarely >> actually be used. It's a tragedy, if I'm understanding this right. >> >> There's a bit of conversation before in the issue about it as well. It's >> difficult for me to say at the moment what the fix is because that's fairly >> complex low-level Solr code that I think few people understand well. >> Nonetheless I'll look into it further this week. >> >> >> > Indexing performance is unacceptable when child documents are involved >> > ---------------------------------------------------------------------- >> > >> > Key: SOLR-14923 >> > URL: https://issues.apache.org/jira/browse/SOLR-14923 >> > Project: Solr >> > Issue Type: Bug >> > Security Level: Public(Default Security Level. Issues are Public) >> > Components: update, UpdateRequestProcessors >> > Affects Versions: master (9.0), 8.3, 8.4, 8.5, 8.6 >> > Reporter: Thomas Wöckinger >> > Priority: Critical >> > Labels: performance >> > >> > Parallel indexing does not make sense at moment when child documents >> are used. >> > The org.apache.solr.update.processor.DistributedUpdateProcessor checks >> at the end of the method doVersionAdd if Ulog caches should be refreshed. >> > This check will return true if any child document is included in the >> AddUpdateCommand. >> > If so ulog.openRealtimeSearcher(); is called, this call is very >> expensive, and executed in a synchronized block of the UpdateLog instance, >> therefore all other operations on the UpdateLog are blocked too. >> > Because every important UpdateLog method (add, delete, ...) is done >> using a synchronized block almost each operation is blocked. >> > This reduces multi threaded index update to a single thread behavior. >> > The described behavior is not depending on any option of the >> UpdateRequest, so it does not make any difference if 'waitFlush', >> 'waitSearcher' or 'softCommit' is true or false. >> > The described behavior makes the usage of ChildDocuments useless, >> because the performance is unacceptable. >> > >> > >> >> >> >> -- >> This message was sent by Atlassian Jira >> (v8.3.4#803005) >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
