GitHub user tstupka opened a pull request:
https://github.com/apache/maven-indexer/pull/12
resolve performance loss due to lucene 4.8.1 - upgrade to lucene 553 and
additional fixes
since lucene was upgraded to 4.8.1 the indexer takes 2.5x longer than with
lucene 3.6. This seems be a cumulative effect of partial reductions in
performance introduced in particular lucene releases after 3.6 - see also
https://issues.apache.org/jira/browse/MINDEXER-99
see the particular commits in this request. Each addresses a suggestion for
a specific improvement. When all applied, the resulting performance is
comparable with the performance before the above mentioned upgrade.
#0bb9484 - upgrading lucene from 4.8.1 to 5.5.3
performance was improved in lucene 5.x.
with 5.5.3 the indexer works significantly faster than with 4.8.1
#3cfa430 - avoid rebuilding groups after reading index
after generating the index it has to be re-read one more time to extract a
distinct list of allGroups and rootGroups, even though that info was already
available, but thrown away.
#8b98a49 - improve reading from zip file
#4062146 - do not unnecessarily force merge on index writer
merge is very expensive - lets trust lucene to merge when it seems fit.
the final index size without force merges was 910mb compared to 900mb with
fm.
the time improvement is aprox 30%
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tstupka/maven-indexer lucene_553
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/maven-indexer/pull/12.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12
----
commit 0bb9484e6eaea3e7974d7e5c9b5ab3d6802780e9
Author: Tomas Stupka <[email protected]>
Date: 2016-10-24T15:25:46Z
upgrading lucene from 4.8.1 to 5.5.3
commit 3cfa430d71a8d58a0454966c0dd183e37f5fb067
Author: Tomas Stupka <[email protected]>
Date: 2016-10-25T08:47:19Z
avoid rebuilding groups after reading index
commit 8b98a495186cafe20ee6494719185e74813ea15e
Author: Tomas Stupka <[email protected]>
Date: 2016-10-25T09:01:18Z
improve reading from zip file
commit 40621465f3ebf14a89961d07ded0d17a4d2d61bc
Author: Tomas Stupka <[email protected]>
Date: 2016-10-25T09:50:29Z
do not unnecessarily force merge on index writer
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]