[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260982#comment-17260982
 ] 

ASF subversion and git services commented on SOLR-14923:
--------------------------------------------------------

Commit 4cb3ad4a1c40b4326aec64577a7e60018f7f1a5e in lucene-solr's branch 
refs/heads/master from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4cb3ad4 ]

* SOLR-14923: Nested docs indexing perf & robustness (#2159)

* When the schema defines _root_, and you want to do atomic/partial updates...
** _root_ needn't be stored or have docValues any more
** _nest_path_ field isn't needed for this any more
** Simplified internal logic
* Allow (and recommend, eventually insist) that the _root_ field be passed for 
atomic/partial updates to child docs.
** In the absence of _root_, assume the _route_ param is equivalent to 
ameliorate back-compat scope.  This is a temporary hack; remove in SOLR-15064.
** One of the two is required; you'll get an exception if the assumption is 
false.  THIS IS A BACK-COMPAT CHANGE
* Ensure that the update log contains the _root_ field if it's defined in the 
schema; in some cases it wasn't.  It's important for robustness of 
atomic/partial updates to child docs.  Caveat: the buffer replay scenario is 
not tested with child docs.
* Limited the cases when a realtime searcher is re-opened.  It was being 
applied to any update that included child docs but now only some narrow subset: 
only for atomic/partial updates, and when the update log contains an in-place 
update for the same nest because it's complicated to resolve those log entries.
* Internal improvements to RealTimeGetComponent to aid clarity & robustness & 
probably performance...
** Use SolrDocumentFetcher.solrDoc(docID, ReturnFields) instead of more manual 
loading.  Will do more with this in another PR.
** Clarify when only root doc IDs are expected.
** Use Resolution enum more, add PARTIAL, remove DOC_WITH_CHILDREN; enhance 
docs.
** When have ReturnFields, a Set of "onlyTheseFields" becomes redundant.  Add a 
child doc resolution via a transformer when needed.
** Clarified where copy-field targets are removed
* NestPathField should default to single valued, instead of inheriting the 
schema default, which for ancient schemas was multi-valued.
* AddUpdateCommand.getLuceneDocument(s) methods are very internal; made package 
visible and refactored a bit for clarity
* DocumentBuilder: when in-place update, skip id and _root_ here, thus also 
simplifying further logic
* NestedShardedAtomicUpdateTest no longer extends AbstractFullDistribZkTestBase 
because it wasn't really leveraging the "control client" checking, and it added 
too much complexity to debug failures.

> Indexing performance is unacceptable when child documents are involved
> ----------------------------------------------------------------------
>
>                 Key: SOLR-14923
>                 URL: https://issues.apache.org/jira/browse/SOLR-14923
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: update, UpdateRequestProcessors
>    Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>            Reporter: Thomas Wöckinger
>            Priority: Critical
>              Labels: performance, pull-request-available
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to