[ https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527623#comment-13527623 ]
Shai Erera commented on LUCENE-4602: ------------------------------------ bq. Shai explained that it's possible now to have facets store only the leaf ord This is a long shot (I haven't tried it yet), but I think that if you implement an OrdinalPolicy which always returns false, then only the leaf node will be written. I.e., I look at CategpryParentStream.incrementToken() code, which is used by CategoryDocumentBuilder to encode all the parents: {code} int ordinal = this.ordinalProperty.getOrdinal(); if (ordinal != -1) { ordinal = this.taxonomyWriter.getParent(ordinal); if (this.ordinalPolicy.shouldAdd(ordinal)) { this.ordinalProperty.setOrdinal(ordinal); try { this.categoryAttribute.addProperty(ordinalProperty); } catch (UnsupportedOperationException e) { throw new IOException(e.getLocalizedMessage()); } added = true; } else { this.ordinalProperty.setOrdinal(-1); } } {code} It looks like the leaf ordinal is added anyway, still didn't track where ... if you don't get to try it today, I'll verify tomorrow that indeed an OrdPolicy that returns false ends up w/ just leaf nodes written. We should compare the current code + leaves only to the DV code + leaves only, to get a better estimate of the gains we should expect when moving to DV. > Use DocValues to store per-doc facet ord > ---------------------------------------- > > Key: LUCENE-4602 > URL: https://issues.apache.org/jira/browse/LUCENE-4602 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: LUCENE-4602.patch > > > Spinoff from LUCENE-4600 > DocValues can be used to hold the byte[] encoding all facet ords for > the document, instead of payloads. I made a hacked up approximation > of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the > gains were somewhat surprisingly large: > {noformat} > Task QPS base StdDev QPS comp StdDev > Pct diff > HighTerm 0.53 (0.9%) 1.00 (2.5%) > 87.3% ( 83% - 91%) > LowTerm 7.59 (0.6%) 26.75 (12.9%) > 252.6% ( 237% - 267%) > MedTerm 3.35 (0.7%) 12.71 (9.0%) > 279.8% ( 268% - 291%) > {noformat} > I didn't think payloads were THAT slow; I think it must be the advance > implementation? > We need to separately test on-disk DV to make sure it's at least > on-par with payloads (but hopefully faster) and if so ... we should > cutover facets to using DV. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org