[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824826#comment-13824826 ]
Michael McCandless commented on LUCENE-5339: -------------------------------------------- bq. I was thinking maybe we can wrap an IW, I think that's a good idea? I added to TODO. bq. Another crazy idea is to implement a FacetDocument which extends Document I'm not sure how this can work, since in order to write the ords we need to see all FacetFields? Ie, at what point would we compile all the FacetFields into the BDV field? bq. we can open up the extension point (I think it's FacetIW.dedupAndEncode) to let the app add the DIM ordinal too? Hmm, we could open that up, but ... I think that's "too late"? You can't easily know which dim ords to add back at that point. I added a TODO. {quote} bq. Seriously? What abstractions are we missing? Well, FacetArrays committing to an int[], while we have someone which wants to use a Map, because he has millions of facets, yet queries typically hit very few of them. {quote} With the simplified APIs this user could just make a custom facet method? bq. Another abstraction is LUCENE-5316, which we struggle to get it to perform (really frustrating!). I agree we need better abstraction here... the 3 int we require per unique facet label is costly. But, I don't think we need to force SSDVFacets to use this abstraction? bq. Why have two faceting modules? A branch is the perfect choice here, since it allows us to move the entire module to the new API. And on the way we benefit by assessing that the new API can really allow implementing associations, complements, sampling. I would also prefer to have a single facet module after all this, but if the requirements (rich functionality vs. simple API) are too divergent, then two modules is at least an option. Progress not perfection... bq. I may compromise on complements and sampling because: (1) complements is not per-segment and I think it might require a full rewrite anywhere (there's an issue for it) and (2) sampling because it's under the *.old package, meaning it was never fully migrated to the new APIs and I think it could use some serious simplification there too. I agree it's bad that complements is "top level"; everything else in the facet module is NRT friendly and I think we should stick with that. I'll work on associations... bq. I personally like code reuse, and it kills me to see code that's duplicated. Often this calls out for bad design of APIs, not necessarily simplification. I think this is a precarious balance. If a little code dup can greatly simplify the APIs, then that's the better tradeoff. bq. Wrong? I mean you still didn't write a version which pulls the ords from OrdinalsCache (and I assume you don't want to get rid of it!). You're right, the ords cache filling will be another place that "bakes in" the decoding. So, I agree: if we can find a clean way to abstract the encoding/source then let's pursue that. > Simplify the facet module APIs > ------------------------------ > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org