[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822232#comment-13822232
 ] 

Shai Erera commented on LUCENE-5339:
------------------------------------

bq. Can we rename CategoryPath to FacetLabel or something more intuitive?

CategoryPath is all about the path components of a category. So the name is not 
entirely out of place. But FacetLabel is a nice too, and definitely aligns w/ 
the rest of the APi, e.g. FacetResultNode has a 'label' member, not 'path'. 
But, CP is such a core class, that renaming it will change all of the facet 
module's classes and every app out there (while the changes in this issue are 
less drastic), so I just wonder if it's a must or nice-to-have. I don't object 
to it, just saying it's a rote renaming of a core class. Anyway, if we want to 
do this, I suggest we do it under a separate issue, just to keep things 
contained.

bq. Can there be sugar like addFields(doc, FacetLabel) so a user doesnt have to 
make lists etc?

I think the varargs version is good to have, though in practice (except for 
tests), I never found the need to use one because usually the facets are added 
by reading them from somewhere, so you anyway need to construct a List or 
Array. It's only in tests, mock demos, or very specific applications that such 
a vararg method will be useful. If we have a super FacetFields class, then this 
method could just be final and delegate to the List variant as you noted in the 
code example. Otherwise, we just need to make sure we add it to every FF, so 
we're consistent.

bq. new LongRange("less than 10", 0L, true, 10L, false) <-- can we make it so 
this is less arguments?

You mean omitting the inclusive params, maybe defaulting to true, or maybe 
minInclusive=true and maxInclusive=false (really depends how you view the max 
value)? I guess we could do that. But then we should do that in all other Range 
types (there are only 3).

bq. For the Taxo case, I think the "document build" process is still too 
complicated.

Why the "Taxo" case? TaxoIndex and SortedSet have similar APIs -- 
SimpleFacetFields (maybe we want to rename it to TaxonomyFacetFields) and 
SortedSetDVFacetFields. Both are needed because they add both drill-down terms 
and BinaryDV (Taxo) or SortedSetDV (SortedSet) fields.

A FacetIndexWriter is a nice idea, but we'd need variants for Taxo, SortedSet 
and Mike explores an enum-method (LUCENE-5326). What I like less about it is 
that today the app can use two FacetFields, say one for Enum and one for 
SortedSet (or whatever other combination) and I don't see how it will do that 
using FacetIW.

Hmm ... or what we could do is have a single FacetIW, but different field 
types, e.g. TaxonomyFacetField, SortedSetFacetField and EnumFacetField. FacetIW 
will capture them and act accordingly. From the app's perspective, the 
interaction w/ the APIs is simpler, since it just adds another field to the 
document. Then we don't even need to change FacetFields APIs (varargs or not). 
I think I like it! Can we do this in a separate issue? I think it's lower 
hanging fruit than the changes proposed in this issue, and are also focused 
(nuking FacetFields essentially in exchange for dedicated Field extensions).

What I also like about FacetIW is that if we want to, we can wrap the app's 
Codec with a more suitable FacetCodec. E.g. maybe there's a better way to 
encode the facet information than a plain BinaryDV (maybe it's a special BDV). 
Asking an app to set a Codec is rather high entry-point. Note that I'm not 
talking about apps that e.g. want to configure the facet DV to use DirectDVF. 
That's ok. I'm talking about if we think there's a better way to encode the 
data for *all* (or maybe depending on a schema) faceted-search apps - then 
requiring a special Codec even less approachable API.

> Simplify the facet module APIs
> ------------------------------
>
>                 Key: LUCENE-5339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5339
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to