[
https://issues.apache.org/jira/browse/MINDEXER-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17965453#comment-17965453
]
Olivier Lamy commented on MINDEXER-151:
---------------------------------------
This project has moved from Jira to GitHub Issues. This issue was migrated to
[apache/maven-indexer#523|https://github.com/apache/maven-indexer/issues/523].
> Speed up Index update from remote
> ---------------------------------
>
> Key: MINDEXER-151
> URL: https://issues.apache.org/jira/browse/MINDEXER-151
> Project: Maven Indexer (Moved to GitHub Issues)
> Issue Type: Improvement
> Reporter: Tamas Cservenak
> Assignee: Tamas Cservenak
> Priority: Major
> Fix For: 7.0.0
>
>
> Currently, if you execute from examples the BasicUsageExample, it will
> perform "full" update, and the full update (to get from "empty" index to "up
> to date" index) takes 15 or more minutes. Yes, Central index is huge, but
> there is room for improvement.
> Steps happening during update(s):
> * properties file downloaded
> * GZ file(s) downloaded (depending is it incremental or full)
> * the GZ files are processed into temporary Lucene index
> * the target (being updated) indexing context index is "replaced" (or
> merged, depends) with temporary Lucene index
> Downloading files are several seconds, but it is the processing of the GZIP
> raw records into Lucene index that takes long time. This can be improved.
> IndexUpdateRequest got new field {{int threads}} with default value of 1
> (same will happen as before). When set to something greater than 1 (accepted
> values are positive numbers), then {{IndexDataReader}} will behave slightly
> differently that with threads=1: it will create N (threads) "silo" indexes,
> spawn N threads, and process the input file on N threads into N silos that
> are merged at the end. This should improve huge update times (as index is
> huge as well), ideally halve it as experiments show (ideal on my HW is 4
> threads that halves the full index update time).
> Using very large numbers may make things worse, as time may be spent on
> managing/merging silos, so the "sweet spot" is probably HW dependendant.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)