[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

Michael McCandless (JIRA) Sun, 17 Nov 2013 04:54:34 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824826#comment-13824826
 ]


Michael McCandless commented on LUCENE-5339:
--------------------------------------------

bq. I was thinking maybe we can wrap an IW,

I think that's a good idea?  I added to TODO.

bq. Another crazy idea is to implement a FacetDocument which extends Document 

I'm not sure how this can work, since in order to write the ords we
need to see all FacetFields?  Ie, at what point would we compile all
the FacetFields into the BDV field?

bq. we can open up the extension point (I think it's FacetIW.dedupAndEncode) to 
let the app add the DIM ordinal too?

Hmm, we could open that up, but ... I think that's "too late"?  You
can't easily know which dim ords to add back at that point.  I added a
TODO. 

{quote}
bq. Seriously? What abstractions are we missing?

Well, FacetArrays committing to an int[], while we have someone which wants to 
use a Map, because he has millions of facets, yet queries typically hit very 
few of them.
{quote}

With the simplified APIs this user could just make a custom facet
method?

bq. Another abstraction is LUCENE-5316, which we struggle to get it to perform 
(really frustrating!).

I agree we need better abstraction here... the 3 int we require per
unique facet label is costly.  But, I don't think we need to force
SSDVFacets to use this abstraction?

bq. Why have two faceting modules? A branch is the perfect choice here, since 
it allows us to move the entire module to the new API. And on the way we 
benefit by assessing that the new API can really allow implementing 
associations, complements, sampling.

I would also prefer to have a single facet module after all this, but
if the requirements (rich functionality vs. simple API) are too
divergent, then two modules is at least an option.  Progress not
perfection...

bq.  I may compromise on complements and sampling because: (1) complements is 
not per-segment and I think it might require a full rewrite anywhere (there's 
an issue for it) and (2) sampling because it's under the *.old package, meaning 
it was never fully migrated to the new APIs and I think it could use some 
serious simplification there too.

I agree it's bad that complements is "top level"; everything else in
the facet module is NRT friendly and I think we should stick with
that.

I'll work on associations...

bq. I personally like code reuse, and it kills me to see code that's 
duplicated. Often this calls out for bad design of APIs, not necessarily 
simplification.

I think this is a precarious balance.  If a little code dup can
greatly simplify the APIs, then that's the better tradeoff.

bq. Wrong? I mean you still didn't write a version which pulls the ords from 
OrdinalsCache (and I assume you don't want to get rid of it!).

You're right, the ords cache filling will be another place that "bakes
in" the decoding.  So, I agree: if we can find a clean way to abstract
the encoding/source then let's pursue that.


> Simplify the facet module APIs
> ------------------------------
>
>                 Key: LUCENE-5339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5339
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch, LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5339) Simplify the facet module APIs

Reply via email to