[
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-4599:
---------------------------------
Attachment: highlightNoStop.tasks
I ran the highlight tasks from luceneutil (I had to remove stop words first,
see attached tasks file). The index contains 500k docs from wikibig and fully
fits in the O/S cache.
{noformat}
TaskQPS Lucene40 StdDevQPS Compressing StdDev
Pct diff
LowTerm 411.77 (3.8%) 268.23 (2.1%)
-34.9% ( -39% - -30%)
MedTerm 352.85 (5.2%) 242.88 (3.8%)
-31.2% ( -38% - -23%)
HighTerm 231.72 (6.7%) 177.52 (5.8%)
-23.4% ( -33% - -11%)
LowPhrase 226.94 (3.7%) 177.07 (1.9%)
-22.0% ( -26% - -16%)
AndHighMed 136.94 (2.2%) 119.63 (1.8%)
-12.6% ( -16% - -8%)
OrHighLow 124.73 (5.2%) 111.01 (3.6%)
-11.0% ( -18% - -2%)
OrHighMed 82.16 (6.8%) 75.91 (4.7%)
-7.6% ( -17% - 4%)
OrHighHigh 73.41 (5.9%) 68.18 (6.1%)
-7.1% ( -18% - 5%)
MedPhrase 34.41 (5.5%) 32.28 (8.1%)
-6.2% ( -18% - 7%)
HighPhrase 44.10 (3.9%) 41.47 (4.9%)
-6.0% ( -14% - 2%)
LowSloppyPhrase 36.07 (4.8%) 33.93 (5.4%)
-5.9% ( -15% - 4%)
AndHighHigh 49.13 (2.6%) 46.27 (2.2%)
-5.8% ( -10% - -1%)
MedSloppyPhrase 12.84 (3.8%) 12.20 (6.2%)
-5.0% ( -14% - 5%)
HighSloppyPhrase 13.20 (4.6%) 12.58 (6.5%)
-4.7% ( -15% - 6%)
LowSpanNear 7.94 (10.7%) 7.69 (8.2%)
-3.2% ( -19% - 17%)
HighSpanNear 5.32 (3.6%) 5.24 (4.4%)
-1.6% ( -9% - 6%)
AndHighLow 3780.85 (4.0%) 3756.77 (7.5%)
-0.6% ( -11% - 11%)
PKLookup 341.94 (2.2%) 340.85 (2.4%)
-0.3% ( -4% - 4%)
Prefix3 122.64 (3.8%) 122.60 (4.3%)
-0.0% ( -7% - 8%)
Wildcard 188.27 (3.2%) 188.23 (3.2%)
-0.0% ( -6% - 6%)
IntNRQ 136.55 (7.2%) 137.57 (7.4%)
0.7% ( -12% - 16%)
MedSpanNear 40.54 (6.0%) 40.94 (6.4%)
1.0% ( -10% - 14%)
Respell 58.13 (4.0%) 59.35 (3.5%)
2.1% ( -5% - 9%)
Fuzzy2 55.24 (5.7%) 57.72 (7.8%)
4.5% ( -8% - 19%)
Fuzzy1 76.40 (5.9%) 83.67 (4.0%)
9.5% ( 0% - 20%)
{noformat}
Results are disappointing, and I'm surprised that some queries perform much
worse while other ones perform better. I'll dig but if someone has an idea why
(I'm not familiar at all with FastVectorHighlighter), I'm interested to know
his/her theory!
> Compressed term vectors
> -----------------------
>
> Key: LUCENE-4599
> URL: https://issues.apache.org/jira/browse/LUCENE-4599
> Project: Lucene - Core
> Issue Type: Task
> Components: core/codecs, core/termvectors
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 4.2
>
> Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log,
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks,
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch,
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with
> stored fields.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]