[ 
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527623#comment-13527623
 ] 

Shai Erera commented on LUCENE-4602:
------------------------------------

bq. Shai explained that it's possible now to have facets store only the leaf ord

This is a long shot (I haven't tried it yet), but I think that if you implement 
an OrdinalPolicy which always returns false, then only the leaf node will be 
written. I.e., I look at CategpryParentStream.incrementToken() code, which is 
used by CategoryDocumentBuilder to encode all the parents:

{code}
      int ordinal = this.ordinalProperty.getOrdinal();
      if (ordinal != -1) {
        ordinal = this.taxonomyWriter.getParent(ordinal);
        if (this.ordinalPolicy.shouldAdd(ordinal)) {
          this.ordinalProperty.setOrdinal(ordinal);
          try {
            this.categoryAttribute.addProperty(ordinalProperty);
          } catch (UnsupportedOperationException e) {
            throw new IOException(e.getLocalizedMessage());
          }
          added = true;
        } else {
          this.ordinalProperty.setOrdinal(-1);
        }
      }
{code}

It looks like the leaf ordinal is added anyway, still didn't track where ... if 
you don't get to try it today, I'll verify tomorrow that indeed an OrdPolicy 
that returns false ends up w/ just leaf nodes written.

We should compare the current code + leaves only to the DV code + leaves only, 
to get a better estimate of the gains we should expect when moving to DV.
                
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
>                 Key: LUCENE-4602
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4602
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads.  I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
>                     Task    QPS base      StdDev    QPS comp      StdDev      
>           Pct diff
>                 HighTerm        0.53      (0.9%)        1.00      (2.5%)   
> 87.3% (  83% -   91%)
>                  LowTerm        7.59      (0.6%)       26.75     (12.9%)  
> 252.6% ( 237% -  267%)
>                  MedTerm        3.35      (0.7%)       12.71      (9.0%)  
> 279.8% ( 268% -  291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to