[ 
https://issues.apache.org/jira/browse/SOLR-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730247#comment-17730247
 ] 

Rahul Goswami edited comment on SOLR-16838 at 6/7/23 6:41 PM:
--------------------------------------------------------------

The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every 1kB of buffer. The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].


was (Author: [email protected]):
The regression seems to be in the Lucene layer. Quoting the discussion on this 
issue on the Lucene list:

" - 8.0 moved the terms index off-heap for non-PK fields with
MMapDirectory. [https://github.com/apache/lucene/issues/9681]
 - Then in 8.6 the FST was moved off-heap all the time.
[https://github.com/apache/lucene/issues/10297]";

 

So now the terms index is off-heap, and due to Lucene's FST reading bytes 
backwards readByte() call causes disk access for every single byte . The below 
tickets have been opened by Adrien Grand on the issue for further discussion:

[https://github.com/apache/lucene/issues/12355] and
[https://github.com/apache/lucene/issues/12356].

> Atomic updates too slow in Solr 8 vs Solr 7
> -------------------------------------------
>
>                 Key: SOLR-16838
>                 URL: https://issues.apache.org/jira/browse/SOLR-16838
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SearchComponents - other
>    Affects Versions: 8.11.1
>            Reporter: Rahul Goswami
>            Priority: Major
>              Labels: RTG, RealTimeGet, atomicupdate
>
> Started experiencing slowness with updates in production after upgrading from 
> Solr 7.7.2 to 8.11.1. Upon comparing the performance it turns out that 
> indexing 20 million docs via atomic updates through the same client program 
> (running 15 parallel threads indexing in batches of 1000) takes below time:
>  
> Solr 7 : 78 mins
> Solr 8:  370 mins 
>  
> Environment details:
> - Java 11 on Windows server
> - Xms1536m Xmx3072m
> - Indexing client code running 15 parallel threads indexing in batches of 1000
> - using SimpleFSDirectoryFactory  (since Mmap doesn't  quite work well on 
> Windows for our index sizes which commonly run north of 1 TB) 
>  
> Looking at the thread dump, the bottleneck seems to be RealTimeGet and I can 
> see that Solr 7 takes a different code path than Solr 8. Note that the 
> performance of regular updates (non-atomic) is still pretty good on Solr 8 
> completing in < 1 hour for the same 20 million data set. 
>  
> Sharing the indexing code, solrconfig, schema and thread dumps in the link 
> below:
> [https://drive.google.com/drive/folders/1q2DPNTYQEU6fi3NeXIKJhaoq3KPnms0h?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to