[ 
https://issues.apache.org/jira/browse/LUCENE-6183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278455#comment-14278455
 ] 

Robert Muir commented on LUCENE-6183:
-------------------------------------

I ran a benchmark indexing log data (just stored fields only, no actual 
"indexing"). 
Stored fields merging in this case is 5x faster with BEST_SPEED and 10x faster 
with BEST_COMPRESSION. Any space differences are trivial.

I will run it also with the deflate-6 in the patch, but I think it will be fine.

iwc.setMergeScheduler(new SerialMergeScheduler());
iwc.setMaxBufferedDocs(10001);
iwc.setMergePolicy(new LogDocMergePolicy());

{noformat}
BEST_SPEED (lz4)
Trunk:
timeIndexing=578014
timeForceMerging=183421
SM 0 [2015-01-15 04:05:30.380; main]: 114732 msec to merge stored fields 
[6881288 docs]
-rw-rw-r--  1 rmuir rmuir 4690955837 Jan 15 04:05 _7j0.fdt
-rw-rw-r--  1 rmuir rmuir    2559414 Jan 15 04:05 _7j0.fdx

Patch:
timeIndexing=389148
timeForceMerging=37476
SM 0 [2015-01-15 03:49:20.538; main]: 21690 msec to merge stored fields 
[6881288 docs]
-rw-rw-r--  1 rmuir rmuir 4691200952 Jan 15 03:49 _6xq.fdt
-rw-rw-r--  1 rmuir rmuir    2557794 Jan 15 03:49 _6xq.fdx

BEST_COMPRESSION (deflate-3)

Trunk:
timeIndexing=586511
timeForceMerging=204363
SM 0 [2015-01-15 03:33:11.906; main]: 130097 msec to merge stored fields 
[6881288 docs]
-rw-rw-r--  1 rmuir rmuir 2673871545 Jan 15 03:33 _5r6.fdt
-rw-rw-r--  1 rmuir rmuir     731953 Jan 15 03:33 _5r6.fdx

Patch:
timeIndexing=364453
timeForceMerging=19519
SM 0 [2015-01-15 03:41:05.477; main]: 11641 msec to merge stored fields 
[6881288 docs]
-rw-rw-r--  1 rmuir rmuir 2674305752 Jan 15 03:41 _6cg.fdt
-rw-rw-r--  1 rmuir rmuir     735374 Jan 15 03:41 _6cg.fdx
{noformat}

> Avoid re-compression on stored fields merge
> -------------------------------------------
>
>                 Key: LUCENE-6183
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6183
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6183.patch
>
>
> We removed this optimization before, it didnt really work right because it 
> required things to be "aligned". 
> But I think we can do it simpler and safer. This recompression is a big cpu 
> hog in merge, and limits our options compression-wise (especially ones like 
> LZ4-HC that are only slower at write-time).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to