[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892131#comment-13892131
 ] 

Shai Erera commented on LUCENE-5425:
------------------------------------

I ran this on a 2013 Wikipedia dump w/ 6.7M docs (full docs, not 1K) and Date 
facet:

{noformat}
                   Task    QPS base      StdDev    QPS comp      StdDev         
       Pct diff
       HighSloppyPhrase        1.80      (9.7%)        1.75      (6.1%)   -2.9% 
( -16% -   14%)
              OrHighLow        5.53      (2.2%)        5.43      (2.7%)   -1.9% 
(  -6% -    3%)
             OrHighHigh        3.81      (2.2%)        3.74      (2.7%)   -1.8% 
(  -6% -    3%)
           OrHighNotLow        9.27      (2.1%)        9.13      (2.6%)   -1.5% 
(  -5% -    3%)
           HighSpanNear        3.77      (5.4%)        3.71      (6.2%)   -1.4% 
( -12% -   10%)
           OrHighNotMed       14.84      (2.1%)       14.64      (2.5%)   -1.3% 
(  -5% -    3%)
          OrHighNotHigh        8.06      (2.4%)        7.96      (2.8%)   -1.2% 
(  -6% -    4%)
        MedSloppyPhrase        1.66      (7.2%)        1.64      (4.3%)   -1.2% 
( -11% -   11%)
           OrNotHighLow       30.04      (4.6%)       29.71      (4.9%)   -1.1% 
( -10% -    8%)
              OrHighMed       12.16      (2.2%)       12.04      (2.3%)   -1.0% 
(  -5% -    3%)
             HighPhrase        2.28     (10.2%)        2.26      (9.2%)   -0.7% 
( -18% -   20%)
          OrNotHighHigh       13.08      (3.0%)       13.00      (3.1%)   -0.7% 
(  -6% -    5%)
                Respell       24.49      (3.3%)       24.33      (3.3%)   -0.6% 
(  -7% -    6%)
           OrNotHighMed       18.02      (4.1%)       17.99      (4.0%)   -0.2% 
(  -7% -    8%)
              LowPhrase        5.73      (7.0%)        5.72      (6.9%)   -0.2% 
( -13% -   14%)
            MedSpanNear       14.97      (3.8%)       14.99      (4.3%)    0.1% 
(  -7% -    8%)
             AndHighLow      199.51      (2.9%)      200.05      (3.6%)    0.3% 
(  -6% -    6%)
            LowSpanNear        4.57      (4.0%)        4.59      (4.7%)    0.3% 
(  -8% -    9%)
              MedPhrase       79.00      (7.4%)       79.23      (6.3%)    0.3% 
( -12% -   15%)
                 Fuzzy2       25.42      (3.0%)       25.56      (3.1%)    0.6% 
(  -5% -    6%)
                 Fuzzy1       35.84      (2.7%)       36.11      (3.7%)    0.7% 
(  -5% -    7%)
        LowSloppyPhrase       20.55      (2.7%)       20.73      (2.3%)    0.9% 
(  -4% -    6%)
               HighTerm       22.31      (3.7%)       22.59      (2.6%)    1.2% 
(  -4% -    7%)
             AndHighMed       16.17      (1.8%)       16.39      (2.3%)    1.3% 
(  -2% -    5%)
            AndHighHigh       15.85      (2.3%)       16.17      (1.7%)    2.1% 
(  -1% -    6%)
                MedTerm       26.51      (3.9%)       27.11      (4.0%)    2.3% 
(  -5% -   10%)
                LowTerm       98.07      (4.6%)      101.55      (5.5%)    3.5% 
(  -6% -   14%)
                 IntNRQ        8.61      (4.3%)        9.20      (4.6%)    6.9% 
(  -1% -   16%)
               Wildcard       12.96      (3.0%)       14.30      (3.6%)   10.3% 
(   3% -   17%)
                Prefix3       74.18      (2.7%)       96.70      (4.9%)   30.4% 
(  22% -   38%)
{noformat}

Results are consistent with yours. So should we proceed w/ the API change?

> Make creation of FixedBitSet in FacetsCollector overridable
> -----------------------------------------------------------
>
>                 Key: LUCENE-5425
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5425
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6
>            Reporter: John Wang
>         Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to