[jira] [Updated] (LUCENE-4602) Use DocValues to store per-doc facet ord

Michael McCandless (JIRA) Sun, 09 Dec 2012 08:21:22 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-4602:
---------------------------------------

    Attachment: LUCENE-4602.patch

Patch with another prototype DV-backed collector.  This one only works
for single-valued fields (in "dimension" terminology, I think: the
document can have multiple dimensions as long as each dimension has
only one category path under it).

It stores only a single ord per doc X
field into a PackedLongDocValuesField.  The collector then aggregates
per-segment, during collection, but only the leaf ords, and then at
the end, it walks up the hierarchy, summing counts to the parent ords.
It's the fastest impl so far:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                HighTerm        0.71      (0.5%)        1.60      (2.3%)  
124.1% ( 120% -  127%)
                 MedTerm        4.43      (0.5%)       31.39      (6.6%)  
609.1% ( 599% -  618%)
                 LowTerm       10.28      (0.4%)       75.37      (3.8%)  
633.0% ( 626% -  639%)
{noformat}

But it's somewhat hacked up (storing DV field directly myself)... Shai
explained that it's possible now to have facets store only the leaf
ord, so once we get DV cleanly integrated we should try that and
re-rest.

                
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
>                 Key: LUCENE-4602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4602
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads.  I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                 HighTerm        0.53      (0.9%)        1.00      (2.5%)   
> 87.3% (  83% -   91%)
>                  LowTerm        7.59      (0.6%)       26.75     (12.9%)  
> 252.6% ( 237% -  267%)
>                  MedTerm        3.35      (0.7%)       12.71      (9.0%)  
> 279.8% ( 268% -  291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4602) Use DocValues to store per-doc facet ord

Reply via email to