[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822450#comment-13822450 ]
Shai Erera commented on LUCENE-5339: ------------------------------------ About the patch: * I think FacetField should an optional ctor taking the indexedFacetField, defaulting to $facets, then the ctor calls super() with the right field, and not "dummy"? And you can remove set/get? * SimpleFacetsCollector jdocs are wrong -- there's no create? * Do we still need SSDVFacetFields? * I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me think if down the road we won't be sorry about doing this change (i.e. if anything changes on IW APIs). The good thing about FacetFields is that it just adds fields to a Document, and doesn't worry about IW API at all... * DimType == DimConfig :). Sorry if I wasn't clear somewhere in my long response. {quote} That handler is such an exotic use case ... and the app can just recurse itself, calling TaxoFacetCounts.getTopChildren? {quote} Could be, maybe it will work. It definitely allows asking for different topK for each child (something we currently don't support). {quote} It does, and I think that's OK? Yes, it's one extra ord it indexes, but that will be a minor perf hit since it's a multi-valued and hierarchical. {quote} I don't know. Like if your facet has 2 levels, that's 33% more ords. I think the count of the root ord is most likely never needed? And if it's needed, app can compute it by traversing its children and their values in the facet arrays? Maybe as a default we just never index it, and don't add a vague requiresDimCount/Value/Weight boolean? {quote} I think replicating bits of this code for the different methods is the lesser evil than adding so many abstractions that the module is hard to approach by new devs/users. {quote} Did we ever get such feedback from users? That the module is unapproachable? I get the opposite feedback - that we don't have many abstractions! :) {quote} At the end of the day, the code that does the facet counting, the rollup, pulling the topK, is in fact a small amount of code; {quote} You have a nocommit "maybe we should do this lazily" in regards for when to rollupValues. That shows me that now every developer who extends this API (let's be clear - users are oblivious to this happening) will facet the same decision (nocommit). If we discover one day that it's better to rollup lazily or not, other developers don't benefit from that decision. That's why I think some abstractions are good. {quote} I added ValueSource aggregation in the next patch, but not associations; I think associations can come later (it's just another index time and search time impl). {quote} I'm not sure we should do that (cut over associations later). The whole point about these features (associations, complements, sampling..) is that they are existing features. If we think they are useless / unneeded - that's one thing. But if we believe they are important, it's useless to make all the API changes without taking them into account, only to figure out later that we need abstraction X and Y in order to implement them. And we make heavy use of associations, and some users asked (and use) sampling and I remember a question about complements. So obviously we cannot conclude that these are useless features. Therefore I think it's important that we try to tackle them now, so don't we don't do a full round trip to find ourselves with the same API again. Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? It's unrelated to the search API changes you want to do here, and it might be easier to contain within a single issue? About CategoryListIterator ... what if we do manage to come up tomorrow with a better encoding strategy for facets. Do you really think that changing all existing WhateverFacets makes sense!? And if developers write their own WhateverFacets, it means they need to change their code too? Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that there are apps that can benefit from a different encoding scheme (e.g. FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better default encoding scheme to introduce abstractions. I mean .. that's just sounds crazy to me. > Simplify the facet module APIs > ------------------------------ > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org