[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226412#comment-17226412 ]
David Smiley commented on SOLR-14923: ------------------------------------- I've had trouble prioritizing this because it requires many hours to investigate through code I don't like. I'll try to give you some answers without (yet) really digging in: bq. Would it be sufficient to track the document ids which require a reload and clear them on each openRealTimeSearcher call? Where would the ID tracking you refer to _go_ (whose responsibility is it)? I don't think UpdateLog. org.apache.solr.update.processor.DistributedUpdateProcessor is doing a lot already. Thinking back to my suggestion back on SOLR-12638, I think I was referring RTGComponent because Mosh said that this guy was the thing that was involved for this use-case. And I was not imagining tracking an ever growing list of IDs somewhere; I think just some sort of dirty flag on RTGComponent. See the variable "mustUseRealtimeSearcher" there -- maybe we could make it get and clear some AtomicReference<Boolean> or something. It's worth a shot but it feels inelegant... I lack the deeper understanding as to why UpdateLog.openRealtimeSearcher must be called at all. Mosh at the time said "RTGComponent is not aware of the newly indexed yet not committed child docs.". This is foggy to me but I don't know why RTGComponent should be aware at all; I don't recall how RTGComponent is involved in the whole thing. Maybe between you and me, we shall figure this out :-) [~markrmil...@gmail.com]: AFAICT you originally added {{UpdateLog.openRealtimeSearcher}}. Why is it located _there_ instead of, say, UpdateHandler? I'm honestly confused that UpdateLog refers to the index altogether; it should be independent according to my conceptual understanding. When there isn't an updateLog (it's technically optional), then there may be a bug because the reader probably needs to be re-opened still. bq. What should be the result of two concurrent updates on the same document? I think it is the same as with normal atomic updates, and due the the fact the there is no rollback on transactions this can only be detected by versioning. Yes; that's logical to me. > Indexing performance is unacceptable when child documents are involved > ---------------------------------------------------------------------- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors > Affects Versions: master (9.0), 8.3, 8.4, 8.5, 8.6 > Reporter: Thomas Wöckinger > Priority: Critical > Labels: performance > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org