[jira] [Commented] (LUCENE-4602) Use DocValues to store per-doc facet ord

Michael McCandless (JIRA) Tue, 15 Jan 2013 10:40:20 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554112#comment-13554112
 ]


Michael McCandless commented on LUCENE-4602:
--------------------------------------------

I tested this with Wikipedia (avg ~25 ords per doc across 9 dimensions; 2.5M 
unique ords):

{noformat}
          Task    QPS base      StdDev    QPS comp      StdDev                
Pct diff
      PKLookup      188.75      (5.3%)      195.05      (2.1%)    3.3% (  -3% - 
  11%)
      HighTerm        3.26      (0.8%)        3.80      (3.4%)   16.3% (  12% - 
  20%)
       MedTerm        6.85      (0.8%)        9.14      (3.1%)   33.4% (  29% - 
  37%)
       LowTerm       14.45      (1.5%)       24.41      (2.1%)   68.9% (  64% - 
  73%)
{noformat}

Nice!

It's odd how the pctg gains are the same for High/Med/LowTerm ... not sure why.
                
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
>                 Key: LUCENE-4602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4602
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: FacetsPayloadMigrationReader.java, LUCENE-4602.patch, 
> LUCENE-4602.patch, LUCENE-4602.patch, LUCENE-4602.patch, 
> TestFacetsPayloadMigrationReader.java
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads.  I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                 HighTerm        0.53      (0.9%)        1.00      (2.5%)   
> 87.3% (  83% -   91%)
>                  LowTerm        7.59      (0.6%)       26.75     (12.9%)  
> 252.6% ( 237% -  267%)
>                  MedTerm        3.35      (0.7%)       12.71      (9.0%)  
> 279.8% ( 268% -  291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4602) Use DocValues to store per-doc facet ord

Reply via email to