[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822450#comment-13822450
 ] 

Shai Erera commented on LUCENE-5339:
------------------------------------

About the patch:

* I think FacetField should an optional ctor taking the indexedFacetField, 
defaulting to $facets, then the ctor calls super() with the right field, and 
not "dummy"? And you can remove set/get?

* SimpleFacetsCollector jdocs are wrong -- there's no create?

* Do we still need SSDVFacetFields?

* I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me 
think if down the road we won't be sorry about doing this change (i.e. if 
anything changes on IW APIs). The good thing about FacetFields is that it just 
adds fields to a Document, and doesn't worry about IW API at all...

* DimType == DimConfig :). Sorry if I wasn't clear somewhere in my long 
response.

{quote}
That handler is such an exotic use case ... and the app can just
recurse itself, calling TaxoFacetCounts.getTopChildren?
{quote}

Could be, maybe it will work. It definitely allows asking for different topK 
for each child (something we currently don't support).

{quote}
It does, and I think that's OK? Yes, it's one extra ord it indexes,
but that will be a minor perf hit since it's a multi-valued and
hierarchical.
{quote}

I don't know. Like if your facet has 2 levels, that's 33% more ords. I think 
the count of the root ord is most likely never needed? And if it's needed, app 
can compute it by traversing its children and their values in the facet arrays? 
Maybe as a default we just never index it, and don't add a vague 
requiresDimCount/Value/Weight boolean?

{quote}
I think replicating bits of this code for the different methods is the lesser
evil than adding so many abstractions that the module is hard to
approach by new devs/users.
{quote}

Did we ever get such feedback from users? That the module is unapproachable?
I get the opposite feedback - that we don't have many abstractions! :)

{quote}
At the end of the day, the code that does the facet counting, the
rollup, pulling the topK, is in fact a small amount of code;
{quote}

You have a nocommit "maybe we should do this lazily" in regards for when to 
rollupValues. That shows me that now every developer who extends this API 
(let's be clear - users are oblivious to this happening) will facet the same 
decision (nocommit). If we discover one day that it's better to rollup lazily 
or not, other developers don't benefit from that decision. That's why I think 
some abstractions are good.

{quote}
I added ValueSource aggregation in the next patch, but not
associations; I think associations can come later (it's just another
index time and search time impl).
{quote}

I'm not sure we should do that (cut over associations later). The whole point 
about these features (associations, complements, sampling..) is that they are 
existing features. If we think they are useless / unneeded - that's one thing. 
But if we believe they are important, it's useless to make all the API changes 
without taking them into account, only to figure out later that we need 
abstraction X and Y in order to implement them.

And we make heavy use of associations, and some users asked (and use) sampling 
and I remember a question about complements. So obviously we cannot conclude 
that these are useless features. Therefore I think it's important that we try 
to tackle them now, so don't we don't do a full round trip to find ourselves 
with the same API again.

Can we do FacetIndexWriter in a separate issue (if we want to do it at all)? 
It's unrelated to the search API changes you want to do here, and it might be 
easier to contain within a single issue?

About CategoryListIterator ... what if we do manage to come up tomorrow with a 
better encoding strategy for facets. Do you really think that changing all 
existing WhateverFacets makes sense!? And if developers write their own 
WhateverFacets, it means they need to change their code too? Really, you're 
mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that 
there are apps that can benefit from a different encoding scheme (e.g. 
FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better 
default encoding scheme to introduce abstractions. I mean .. that's just sounds 
crazy to me.

> Simplify the facet module APIs
> ------------------------------
>
>                 Key: LUCENE-5339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5339
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch, LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to