[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822721#comment-13822721
 ] 

Michael McCandless commented on LUCENE-5339:
--------------------------------------------

Thanks for the feedback Shai.

bq. I think FacetField should an optional ctor taking the indexedFacetField, 
defaulting to $facets, then the ctor calls super() with the right field, and 
not "dummy"? And you can remove set/get?

I moved the indexed field name to the DimConfig.

bq. SimpleFacetsCollector jdocs are wrong – there's no create?

I removed it and put nocommit.

bq. Do we still need SSDVFacetFields?

Woops, no; I removed it.

bq. I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes 
me think if down the road we won't be sorry about doing this change (i.e. if 
anything changes on IW APIs). The good thing about FacetFields is that it just 
adds fields to a Document, and doesn't worry about IW API at all...

I think that's an acceptable risk in exchange for the simpler
index-time API.

bq. DimType == DimConfig . Sorry if I wasn't clear somewhere in my long 
response.

Ahh, OK.  I'm still wondering if we should put the "facetMethod" onto
the DimConfig...

bq. Like if your facet has 2 levels, that's 33% more ords. I think the count of 
the root ord is most likely never needed? And if it's needed, app can compute 
it by traversing its children and their values in the facet arrays? Maybe as a 
default we just never index it, and don't add a vague 
requiresDimCount/Value/Weight boolean?

Wait, the app cannot compute this (accurately) by summing the child counts?  It 
will overcount in general, right?

{quote}
bq. I think replicating bits of this code for the different methods is the 
lesser evil than adding so many abstractions that the module is hard to 
approach by new devs/users.

Did we ever get such feedback from users? That the module is unapproachable?
{quote}

It's mostly from my own assessment, looking at the code, and my own
frustrations over time in trying to improve the facets module;
LUCENE-5333 was finally the straw (for me)... I find complex APIs
frustrating and I think it's a serious barrier to new contributors
getting involved and users consuming them.

There was a user on the ML (I don't have the link) who just wanted to
get the facet count for a specific label after faceting was done, and
the hoops s/he had to jump through (custom FacetResultHandler I
think??) to achieve that was just crazy.

bq. I get the opposite feedback - that we don't have many abstractions! 

Seriously?  What abstractions are we missing?

If this is too much change for the facets module, we could, instead,
leave the facets module as is, and break this effort out as a
different module (facets2, simplefacets, something?).  We also have
many queryparsers, many highlighters, etc., and I think that's
healthy: all options can be explored.

bq. You have a nocommit "maybe we should do this lazily" in regards for when to 
rollupValues. That shows me that now every developer who extends this API 
(let's be clear - users are oblivious to this happening) will facet the same 
decision (nocommit). If we discover one day that it's better to rollup lazily 
or not, other developers don't benefit from that decision. That's why I think 
some abstractions are good.

It's a crazy expert thing to create another faceting impl, so I think
such developers can handle changing their rollup to be lazy if it's
beneficial/necessary for their use case.

bq. I'm not sure we should do that (cut over associations later). The whole 
point about these features (associations, complements, sampling..) is that they 
are existing features. If we think they are useless / unneeded - that's one 
thing. But if we believe they are important, it's useless to make all the API 
changes without taking them into account, only to figure out later that we need 
abstraction X and Y in order to implement them.

Well this could be a good argument for just making a new module?

The new module would have a simpler API and less thorough
functionality?

bq. Can we do FacetIndexWriter in a separate issue (if we want to do it at 
all)? It's unrelated to the search API changes you want to do here, and it 
might be easier to contain within a single issue?

I'm not sure it's so easily separated out; the DimConfig is common to
index time and search time, and we're still iterating on that (I just
moved the indexedFieldName into it).

bq. About CategoryListIterator ... what if we do manage to come up tomorrow 
with a better encoding strategy for facets. Do you really think that changing 
all existing WhateverFacets makes sense!? And if developers write their own 
WhateverFacets, it means they need to change their code too? Really, you're 
mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that 
there are apps that can benefit from a different encoding scheme (e.g. 
FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better 
default encoding scheme to introduce abstractions. I mean .. that's just sounds 
crazy to me.

We'd have to change 3 places if we did that right now: FacetIndexWriter
(where we encode) and TaxonomyFacetCounts/SumValueSource (where we
decode).

Maybe... we can have a decode method in TFC, and TFSVC subclasses TFC,
or ... something ... but I'd like to be very careful in just adding
back abstractions here: this really is a small amount of code at the
end of the day.


> Simplify the facet module APIs
> ------------------------------
>
>                 Key: LUCENE-5339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5339
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch, LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to