[jira] [Commented] (LUCENE-4602) Use DocValues to store per-doc facet ord

Shai Erera (JIRA) Tue, 15 Jan 2013 12:12:20 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554261#comment-13554261
 ]


Shai Erera commented on LUCENE-4602:
------------------------------------

Found your comments about CategoryListCache

bq. I had separately previously tested the existing int[][][] cache 
(CategoryListCache) but it had smaller gains than this (73% for MedTerm), and 
it required more RAM (1.9 GB vs 377 RAM for this patch).

That was on 3 ordinals per document, and already consuming a very large piece 
of RAM. Also, the gains are not considerable vs the DocValues and PackedBytes 
versions that you had. And I assume that testing it with 25 ordinals per 
document is going to be even more costly.
In addition, CategoryListCache is not per-segment, so if we want to keep it, 
we'd need to do some major rewriting there. I suggest that we just nuke it for 
now and come back to it later. The problem with keeping it is that we need to 
maintain it, and if it's not that much better, I prefer to nuke it.
                
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
>                 Key: LUCENE-4602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4602
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: FacetsPayloadMigrationReader.java, LUCENE-4602.patch, 
> LUCENE-4602.patch, LUCENE-4602.patch, LUCENE-4602.patch, 
> TestFacetsPayloadMigrationReader.java
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads.  I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                 HighTerm        0.53      (0.9%)        1.00      (2.5%)   
> 87.3% (  83% -   91%)
>                  LowTerm        7.59      (0.6%)       26.75     (12.9%)  
> 252.6% ( 237% -  267%)
>                  MedTerm        3.35      (0.7%)       12.71      (9.0%)  
> 279.8% ( 268% -  291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4602) Use DocValues to store per-doc facet ord

Reply via email to