[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822721#comment-13822721 ]
Michael McCandless commented on LUCENE-5339: -------------------------------------------- Thanks for the feedback Shai. bq. I think FacetField should an optional ctor taking the indexedFacetField, defaulting to $facets, then the ctor calls super() with the right field, and not "dummy"? And you can remove set/get? I moved the indexed field name to the DimConfig. bq. SimpleFacetsCollector jdocs are wrong – there's no create? I removed it and put nocommit. bq. Do we still need SSDVFacetFields? Woops, no; I removed it. bq. I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me think if down the road we won't be sorry about doing this change (i.e. if anything changes on IW APIs). The good thing about FacetFields is that it just adds fields to a Document, and doesn't worry about IW API at all... I think that's an acceptable risk in exchange for the simpler index-time API. bq. DimType == DimConfig . Sorry if I wasn't clear somewhere in my long response. Ahh, OK. I'm still wondering if we should put the "facetMethod" onto the DimConfig... bq. Like if your facet has 2 levels, that's 33% more ords. I think the count of the root ord is most likely never needed? And if it's needed, app can compute it by traversing its children and their values in the facet arrays? Maybe as a default we just never index it, and don't add a vague requiresDimCount/Value/Weight boolean? Wait, the app cannot compute this (accurately) by summing the child counts? It will overcount in general, right? {quote} bq. I think replicating bits of this code for the different methods is the lesser evil than adding so many abstractions that the module is hard to approach by new devs/users. Did we ever get such feedback from users? That the module is unapproachable? {quote} It's mostly from my own assessment, looking at the code, and my own frustrations over time in trying to improve the facets module; LUCENE-5333 was finally the straw (for me)... I find complex APIs frustrating and I think it's a serious barrier to new contributors getting involved and users consuming them. There was a user on the ML (I don't have the link) who just wanted to get the facet count for a specific label after faceting was done, and the hoops s/he had to jump through (custom FacetResultHandler I think??) to achieve that was just crazy. bq. I get the opposite feedback - that we don't have many abstractions! Seriously? What abstractions are we missing? If this is too much change for the facets module, we could, instead, leave the facets module as is, and break this effort out as a different module (facets2, simplefacets, something?). We also have many queryparsers, many highlighters, etc., and I think that's healthy: all options can be explored. bq. You have a nocommit "maybe we should do this lazily" in regards for when to rollupValues. That shows me that now every developer who extends this API (let's be clear - users are oblivious to this happening) will facet the same decision (nocommit). If we discover one day that it's better to rollup lazily or not, other developers don't benefit from that decision. That's why I think some abstractions are good. It's a crazy expert thing to create another faceting impl, so I think such developers can handle changing their rollup to be lazy if it's beneficial/necessary for their use case. bq. I'm not sure we should do that (cut over associations later). The whole point about these features (associations, complements, sampling..) is that they are existing features. If we think they are useless / unneeded - that's one thing. But if we believe they are important, it's useless to make all the API changes without taking them into account, only to figure out later that we need abstraction X and Y in order to implement them. Well this could be a good argument for just making a new module? The new module would have a simpler API and less thorough functionality? bq. Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? It's unrelated to the search API changes you want to do here, and it might be easier to contain within a single issue? I'm not sure it's so easily separated out; the DimConfig is common to index time and search time, and we're still iterating on that (I just moved the indexedFieldName into it). bq. About CategoryListIterator ... what if we do manage to come up tomorrow with a better encoding strategy for facets. Do you really think that changing all existing WhateverFacets makes sense!? And if developers write their own WhateverFacets, it means they need to change their code too? Really, you're mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that there are apps that can benefit from a different encoding scheme (e.g. FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better default encoding scheme to introduce abstractions. I mean .. that's just sounds crazy to me. We'd have to change 3 places if we did that right now: FacetIndexWriter (where we encode) and TaxonomyFacetCounts/SumValueSource (where we decode). Maybe... we can have a decode method in TFC, and TFSVC subclasses TFC, or ... something ... but I'd like to be very careful in just adding back abstractions here: this really is a small amount of code at the end of the day. > Simplify the facet module APIs > ------------------------------ > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-5339.patch, LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org