[jira] [Comment Edited] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Remi Melisson (JIRA) Thu, 09 Jan 2014 09:03:45 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866760#comment-13866760
 ]


Remi Melisson edited comment on LUCENE-5354 at 1/9/14 4:57 PM:
---------------------------------------------------------------

Hi!
Here is new patch including your comment for the coefficient calculation (I 
guess a Lambda function would be perfect here!).

I ran the performance test on my laptop, here are the results compared to the 
AnalyzingInfixSuggester : 
-- construction time
AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58]
BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52]
-- prefixes: 2-4, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7
BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2
-- prefixes: 6-9, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13
BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9
-- prefixes: 100-200, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19
BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25
-- RAM consumption
AnalyzingInfixSuggester size[B]:    1,430,920
BlendedInfixSuggester size[B]:    1,630,488

If you have any idea on how we could improve the performance, let me know (see 
above my comment for your previous suggestion to avoid visiting term vectors).


was (Author: rmelisson):
Hi!
Here is new patch including your comment for the coefficient calculation (I 
guess a Lambda function would be perfect here!).

I ran the performance test on my laptop, here is the results compared to the 
AnalyzingInfixSuggester : 
-- construction time
AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58]
BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52]
-- prefixes: 2-4, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7
BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2
-- prefixes: 6-9, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13
BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9
-- prefixes: 100-200, num: 7, onlyMorePopular: false
AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19
BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25
-- RAM consumption
AnalyzingInfixSuggester size[B]:    1,430,920
BlendedInfixSuggester size[B]:    1,630,488

If you have any idea on how we could improve the performance, let me know (see 
above my comment for your previous suggestion to avoid visiting term vectors).

> Blended score in AnalyzingInfixSuggester
> ----------------------------------------
>
>                 Key: LUCENE-5354
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5354
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>    Affects Versions: 4.4
>            Reporter: Remi Melisson
>            Priority: Minor
>              Labels: suggester
>         Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, 
> LUCENE-5354_3.patch
>
>
> I'm working on a custom suggester derived from the AnalyzingInfix. I require 
> what is called a "blended score" (//TODO ln.399 in AnalyzingInfixSuggester) 
> to transform the suggestion weights depending on the position of the searched 
> term(s) in the text.
> Right now, I'm using an easy solution :
> If I want 10 suggestions, then I search against the current ordered index for 
> the 100 first results and transform the weight :
> bq. a) by using the term position in the text (found with TermVector and 
> DocsAndPositionsEnum)
> or
> bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
> searching
> and return the updated 10 most weighted suggestions.
> Since we usually don't need to suggest so many things, the bigger search + 
> rescoring overhead is not so significant but I agree that this is not the 
> most elegant solution.
> We could include this factor (here the position of the term) directly into 
> the index.
> So, I can contribute to this if you think it's worth adding it.
> Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
> dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

Reply via email to