[ https://issues.apache.org/jira/browse/MINDEXER-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tamas Cservenak closed MINDEXER-151. ------------------------------------ Resolution: Fixed > Speed up Index update from remote > --------------------------------- > > Key: MINDEXER-151 > URL: https://issues.apache.org/jira/browse/MINDEXER-151 > Project: Maven Indexer > Issue Type: Improvement > Reporter: Tamas Cservenak > Assignee: Tamas Cservenak > Priority: Major > Fix For: 7.0.0 > > > Currently, if you execute from examples the BasicUsageExample, it will > perform "full" update, and the full update (to get from "empty" index to "up > to date" index) takes 15 or more minutes. Yes, Central index is huge, but > there is room for improvement. > Steps happening during update(s): > * properties file downloaded > * GZ file(s) downloaded (depending is it incremental or full) > * the GZ files are processed into temporary Lucene index > * the target (being updated) indexing context index is "replaced" (or > merged, depends) with temporary Lucene index > Downloading files are several seconds, but it is the processing of the GZIP > raw records into Lucene index that takes long time. This can be improved. > IndexUpdateRequest got new field {{int threads}} with default value of 1 > (same will happen as before). When set to something greater than 1 (accepted > values are positive numbers), then {{IndexDataReader}} will behave slightly > differently that with threads=1: it will create N (threads) "silo" indexes, > spawn N threads, and process the input file on N threads into N silos that > are merged at the end. This should improve huge update times (as index is > huge as well), ideally halve it as experiments show (ideal on my HW is 4 > threads that halves the full index update time). > Using very large numbers may make things worse, as time may be spent on > managing/merging silos, so the "sweet spot" is probably HW dependendant. -- This message was sent by Atlassian Jira (v8.20.10#820010)