[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248302#comment-17248302 ]
David Smiley commented on SOLR-14923: ------------------------------------- Hey, nice PR! It's not as hacky as I feared it might be -- just an AtomicBoolean (no potentially unbounded Map). I really appreciate your performance benchmarks to prove this out. I'm going to look a bit further this weekend to see if the openRealtimeSearcher can be avoided further. For example, in RTG.getInputDocument, you added the potential open _before_ the check if the doc from the updateLog is null... ({{sid == null}}) but shouldn't we not if sid isn't null? Thus move it down right below, indented. Also, maybe in DUP, we can sometimes further restrict when this logic happens -- perhaps only when the document coming in is an atomic update. I'll investigate that. Earlier in this issue, you tried modifying UpdateLog.openRealtimeSearcher to move the searcher re-open outside of the synchronized block. That makes sense to me; we should do that. > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: 8.3, 8.4, 8.5, 8.6, master (9.0) > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org