[ https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822232#comment-13822232 ]
Shai Erera commented on LUCENE-5339: ------------------------------------ bq. Can we rename CategoryPath to FacetLabel or something more intuitive? CategoryPath is all about the path components of a category. So the name is not entirely out of place. But FacetLabel is a nice too, and definitely aligns w/ the rest of the APi, e.g. FacetResultNode has a 'label' member, not 'path'. But, CP is such a core class, that renaming it will change all of the facet module's classes and every app out there (while the changes in this issue are less drastic), so I just wonder if it's a must or nice-to-have. I don't object to it, just saying it's a rote renaming of a core class. Anyway, if we want to do this, I suggest we do it under a separate issue, just to keep things contained. bq. Can there be sugar like addFields(doc, FacetLabel) so a user doesnt have to make lists etc? I think the varargs version is good to have, though in practice (except for tests), I never found the need to use one because usually the facets are added by reading them from somewhere, so you anyway need to construct a List or Array. It's only in tests, mock demos, or very specific applications that such a vararg method will be useful. If we have a super FacetFields class, then this method could just be final and delegate to the List variant as you noted in the code example. Otherwise, we just need to make sure we add it to every FF, so we're consistent. bq. new LongRange("less than 10", 0L, true, 10L, false) <-- can we make it so this is less arguments? You mean omitting the inclusive params, maybe defaulting to true, or maybe minInclusive=true and maxInclusive=false (really depends how you view the max value)? I guess we could do that. But then we should do that in all other Range types (there are only 3). bq. For the Taxo case, I think the "document build" process is still too complicated. Why the "Taxo" case? TaxoIndex and SortedSet have similar APIs -- SimpleFacetFields (maybe we want to rename it to TaxonomyFacetFields) and SortedSetDVFacetFields. Both are needed because they add both drill-down terms and BinaryDV (Taxo) or SortedSetDV (SortedSet) fields. A FacetIndexWriter is a nice idea, but we'd need variants for Taxo, SortedSet and Mike explores an enum-method (LUCENE-5326). What I like less about it is that today the app can use two FacetFields, say one for Enum and one for SortedSet (or whatever other combination) and I don't see how it will do that using FacetIW. Hmm ... or what we could do is have a single FacetIW, but different field types, e.g. TaxonomyFacetField, SortedSetFacetField and EnumFacetField. FacetIW will capture them and act accordingly. From the app's perspective, the interaction w/ the APIs is simpler, since it just adds another field to the document. Then we don't even need to change FacetFields APIs (varargs or not). I think I like it! Can we do this in a separate issue? I think it's lower hanging fruit than the changes proposed in this issue, and are also focused (nuking FacetFields essentially in exchange for dedicated Field extensions). What I also like about FacetIW is that if we want to, we can wrap the app's Codec with a more suitable FacetCodec. E.g. maybe there's a better way to encode the facet information than a plain BinaryDV (maybe it's a special BDV). Asking an app to set a Codec is rather high entry-point. Note that I'm not talking about apps that e.g. want to configure the facet DV to use DirectDVF. That's ok. I'm talking about if we think there's a better way to encode the data for *all* (or maybe depending on a schema) faceted-search apps - then requiring a special Codec even less approachable API. > Simplify the facet module APIs > ------------------------------ > > Key: LUCENE-5339 > URL: https://issues.apache.org/jira/browse/LUCENE-5339 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Attachments: LUCENE-5339.patch > > > I'd like to explore simplifications to the facet module's APIs: I > think the current APIs are complex, and the addition of a new feature > (sparse faceting, LUCENE-5333) threatens to add even more classes > (e.g., FacetRequestBuilder). I think we can do better. > So, I've been prototyping some drastic changes; this is very > early/exploratory and I'm not sure where it'll wind up but I think the > new approach shows promise. > The big changes are: > * Instead of *FacetRequest/Params/Result, you directly instantiate > the classes that do facet counting (currently TaxonomyFacetCounts, > RangeFacetCounts or SortedSetDVFacetCounts), passing in the > SimpleFacetsCollector, and then you interact with those classes to > pull labels + values (topN under a path, sparse, specific labels). > * At index time, no more FacetIndexingParams/CategoryListParams; > instead, you make a new SimpleFacetFields and pass it the field it > should store facets + drill downs under. If you want more than > one CLI you create more than one instance of SimpleFacetFields. > * I added a simple schema, where you state which dimensions are > hierarchical or multi-valued. From this we decide how to index > the ordinals (no more OrdinalPolicy). > Sparse faceting is just another method (getAllDims), on both taxonomy > & ssdv facet classes. > I haven't created a common base class / interface for all of the > search-time facet classes, but I think this may be possible/clean, and > perhaps useful for drill sideways. > All the new classes are under oal.facet.simple.*. > Lots of things that don't work yet: drill sideways, complements, > associations, sampling, partitions, etc. This is just a start ... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org