[jira] [Commented] (MINDEXER-151) Speed up Index update from remote

Olivier Lamy (Jira) Wed, 11 Jun 2025 23:29:50 -0700


    [ 
https://issues.apache.org/jira/browse/MINDEXER-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17965453#comment-17965453
 ]


Olivier Lamy commented on MINDEXER-151:
---------------------------------------

This project has moved from Jira to GitHub Issues. This issue was migrated to 
[apache/maven-indexer#523|https://github.com/apache/maven-indexer/issues/523]. 

> Speed up Index update from remote
> ---------------------------------
>
>                 Key: MINDEXER-151
>                 URL: https://issues.apache.org/jira/browse/MINDEXER-151
>             Project: Maven Indexer (Moved to GitHub Issues)
>          Issue Type: Improvement
>            Reporter: Tamas Cservenak
>            Assignee: Tamas Cservenak
>            Priority: Major
>             Fix For: 7.0.0
>
>
> Currently, if you execute from examples the BasicUsageExample, it will 
> perform "full" update, and the full update (to get from "empty" index to "up 
> to date" index) takes 15 or more minutes. Yes, Central index is huge, but 
> there is room for improvement.
> Steps happening during update(s):
>  * properties file downloaded
>  * GZ file(s) downloaded (depending is it incremental or full)
>  * the GZ files are processed into temporary Lucene index
>  * the target (being updated) indexing context index is "replaced" (or 
> merged, depends) with temporary Lucene index
> Downloading files are several seconds, but it is the processing of the GZIP 
> raw records into Lucene index that takes long time. This can be improved.
> IndexUpdateRequest got new field {{int threads}} with default value of 1 
> (same will happen as before). When set to something greater than 1 (accepted 
> values are positive numbers), then {{IndexDataReader}} will behave slightly 
> differently that with threads=1: it will create N (threads) "silo" indexes, 
> spawn N threads, and process the input file on N threads into N silos that 
> are merged at the end. This should improve huge update times (as index is 
> huge as well), ideally halve it as experiments show (ideal on my HW is 4 
> threads that halves the full index update time).
> Using very large numbers may make things worse, as time may be spent on 
> managing/merging silos, so the "sweet spot" is probably HW dependendant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (MINDEXER-151) Speed up Index update from remote

Reply via email to