[ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4600:
---------------------------------------

    Attachment: LUCENE-4600-cli.patch

bq. Also, you can obtain the right IntDecoder from the CLP for decoding the 
ordinals. That would remove the hard dependency on VInt+gap, and allow e.g. to 
use a PackedInts decoder.

I tried this, changing the CountingFacetsCollector to the attached
patch (to use CategoryListIterator), but alas those abstractions are
apparently costing us in this hotspot (unless I screwed something up
in the patch?  Eg, that null I pass is kinda spooky!):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                HighTerm        0.86      (4.7%)        0.56      (0.4%)  
-34.4% ( -37% -  -30%)
                 MedTerm        5.85      (1.0%)        5.04      (0.5%)  
-13.9% ( -15% -  -12%)
                 LowTerm       11.82      (0.6%)       11.02      (0.5%)   
-6.8% (  -7% -   -5%)
{noformat}

base is the original CountingFacetsCollector and comp is the new one
using the CategoryListIterator API.

I think we should try to invoke specialized collectors when possible?

                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to