[
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558401#comment-13558401
]
Adrien Grand commented on LUCENE-4599:
--------------------------------------
OK, I think I understood: I had forgotten to turn debug off, and although
documents in this collection are rather big, queries tend to favor small docs,
whose chunks contain more documents (up to 30). I ran the benchmark again with
a very small chunk size (128) so that chunks would likely contain a single doc
and results got better :
{noformat}
Fuzzy2 94.39 (7.8%) 88.33 (7.5%)
-6.4% ( -20% - 9%)
MedTerm 292.09 (2.7%) 279.01 (2.6%)
-4.5% ( -9% - 0%)
OrHighHigh 76.84 (7.4%) 73.58 (5.8%)
-4.2% ( -16% - 9%)
Fuzzy1 93.07 (4.8%) 89.59 (4.4%)
-3.7% ( -12% - 5%)
OrHighMed 69.23 (6.4%) 67.17 (4.9%)
-3.0% ( -13% - 8%)
HighPhrase 8.54 (9.4%) 8.36 (11.6%)
-2.1% ( -21% - 20%)
LowPhrase 125.02 (2.5%) 122.91 (3.4%)
-1.7% ( -7% - 4%)
MedPhrase 39.97 (5.3%) 39.58 (7.6%)
-1.0% ( -13% - 12%)
HighTerm 177.70 (2.4%) 176.21 (2.2%)
-0.8% ( -5% - 3%)
LowTerm 370.26 (3.7%) 367.36 (2.8%)
-0.8% ( -7% - 5%)
OrHighLow 106.08 (5.2%) 105.41 (4.7%)
-0.6% ( -10% - 9%)
LowSloppyPhrase 71.29 (5.2%) 70.95 (5.3%)
-0.5% ( -10% - 10%)
HighSloppyPhrase 30.52 (5.6%) 30.39 (5.2%)
-0.4% ( -10% - 10%)
PKLookup 339.12 (3.0%) 338.09 (3.1%)
-0.3% ( -6% - 5%)
MedSloppyPhrase 71.13 (4.2%) 70.95 (4.4%)
-0.3% ( -8% - 8%)
AndHighLow 259.19 (3.8%) 258.54 (5.1%)
-0.2% ( -8% - 8%)
Respell 69.04 (3.7%) 68.92 (3.2%)
-0.2% ( -6% - 6%)
AndHighHigh 74.49 (1.5%) 74.47 (1.8%)
-0.0% ( -3% - 3%)
Wildcard 157.16 (2.0%) 157.21 (1.9%)
0.0% ( -3% - 3%)
AndHighMed 79.81 (2.1%) 80.16 (1.6%)
0.4% ( -3% - 4%)
MedSpanNear 14.09 (3.6%) 14.16 (4.4%)
0.5% ( -7% - 8%)
Prefix3 281.17 (2.7%) 282.85 (2.5%)
0.6% ( -4% - 5%)
HighSpanNear 7.73 (3.9%) 7.79 (2.8%)
0.8% ( -5% - 7%)
IntNRQ 143.14 (3.0%) 144.45 (3.2%)
0.9% ( -5% - 7%)
LowSpanNear 23.85 (6.6%) 24.36 (6.0%)
2.2% ( -9% - 15%)
{noformat}
(Decreasing the chunk size from 16KB to 128 made the compression ratio increase
from 66% to 68%.)
> Compressed term vectors
> -----------------------
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
> Issue Type: Task
> Components: core/codecs, core/termvectors
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log,
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks,
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch,
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with
> stored fields.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]