Hi,
We plan to upgrade the Lucene library in our application from 2.4.1 to 3.5.0. I
have been running benchmark tests that come with Lucence. To my surprise, I
found that the indexing in 3.5.0 is significant slower than 2.4.1 for the
Wikipedia data.
Attached is the algorithm for the tests. The tests used default Lucence
settings for flush memory size and merge factor. 512M memory was used for the
tasks. The test machine is a 64-bit Windows 7 machine with Intel Core i7.
The command:
%ant -Dtask.alg=conf/wikipedia-default.alg -Dtask.mem=512M run-task
Here are the test results:
Lucece 2.4.1
[java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about
3 out of 14)
[java] Operation round flush mrg runCnt recsPerRun rec/s
elapsedSec avgUsedMem avgTotalMem
[java] MAddDocs_200000 0 16.00 10 1 200000 1,609.1
124.29 89,218,496 241,631,232
[java] MAddDocs_200000 - 1 16.00 10 - - 1 - - 200000 - - 1,746.4
- - 114.52 - 102,365,864 - 241,762,304
[java] MAddDocs_200000 2 16.00 10 1 200000 1,566.8
127.65 69,428,144 174,194,688
Lucene 2.9.4
[java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3
out of 14)
[java] Operation round flush mrg runCnt recsPerRun rec/s
elapsedSec avgUsedMem avgTotalMem
[java] MAddDocs_200000 0 16.00 10 1 200000 1,046.49
191.12 82,676,152 139,657,216
[java] MAddDocs_200000 - 1 16.00 10 - - 1 - - 200000 - 1,165.35
- - 171.62 - 119,364,128 - 156,762,112
[java] MAddDocs_200000 2 16.00 10 1 200000 1,245.86
160.53 50,361,760 137,625,600
Lucene 3.5.0
[java] ------------> Report sum by Prefix (MAddDocs) and Round (3 about 3
out of 14)
[java] Operation round flush mrg runCnt recsPerRun rec/s
elapsedSec avgUsedMem avgTotalMem
[java] MAddDocs_200000 0 16.00 10 1 200000 676.48
295.65 70,917,592 129,695,744
[java] MAddDocs_200000 - 1 16.00 10 - - 1 - - 200000 - - 626.13
- - 319.42 - 50,329,552 - 94,240,768
[java] MAddDocs_200000 2 16.00 10 1 200000 687.68
290.83 57,732,640 92,864,512
The indexing speed using 2.4.1 is 2.3x of the speed using 3.5.0. Did I miss
any settings or configurations?
Thanks,
Sean
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]