[
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530923#comment-13530923
]
Shai Erera commented on LUCENE-4602:
------------------------------------
Who said anything about blocking progress? All I'm saying is that before we
release these improvements, we need to have a migration plan. These are not
just my customers. I think that there are other people that use this module, at
least from questions that pop up here and there on the list.
Sure, we can tell everyone to re-index. But that's not how I prefer to work. I
don't think that cutting over to DV is the only migration we should talk about.
E.g. LUCENE-4623 would also require migration and any change in the future to
how we decide to store/encode facets would require migration.
It would be good if we can think about a layer that will provide that
migration. Today we have Codecs and Lucene guarantees that old segments will be
read w/ old Codecs versions (per our back-compat policy). What I would like to
develop is something similar, which can read facets from old segments in the
old way, and ultimately when segments are merged, migrate data to the new
format. Then we can tell customers that if they didn't migrate their indexes
when Lucene 6.0 is released, they have to addIndexes or forceMerge or something.
I know that this module has the @lucene.experimental tag on it all over the
place, but I don't treat it as experimental at all. I would prefer that you
help me develop this migration layer, even if just by contributing ideas,
rather than tell me that it's my problem and that I get paid to solve it :).
I don't think that we should release code that breaks all apps out there and
forces them to reindex. Unless the changes are really non-migratable. But in
this case, I think it should be easy? If you want to chime in, I'll open a
separate issue to discuss this.
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
> Key: LUCENE-4602
> URL: https://issues.apache.org/jira/browse/LUCENE-4602
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: LUCENE-4602.patch, LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads. I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> HighTerm 0.53 (0.9%) 1.00 (2.5%)
> 87.3% ( 83% - 91%)
> LowTerm 7.59 (0.6%) 26.75 (12.9%)
> 252.6% ( 237% - 267%)
> MedTerm 3.35 (0.7%) 12.71 (9.0%)
> 279.8% ( 268% - 291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]