[ 
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822360#comment-13822360
 ] 

Michael McCandless commented on LUCENE-5339:
--------------------------------------------

bq. You have a TODO file which seems to have been 'svn added' - are you aware 
of it? 

I put a nocommit.

bq. Maybe we should do this work in a branch and avoid the .simple package? 

I'll start a branch, but on the branch I'd like to keep working on
.simple for now while we bang out the API changes.  If things start to
crystallize then we can start cutting everything else over?

bq. Can FT.FieldType provide a ctor taking these arguments and then 
DEFAULT_FIELD_TYPE pass (false,false)? Or, init those two members to false.
bq. Maybe instead of setHierarchical + setMultiValued you do 
setDimType(DimType)? Then you could avoid the synchronization?

I put a nocommit.

bq. I wonder if FieldTypes won't confuse users w/ Field.FieldType, so maybe you 
should name it DimType or something? And the upper class FacetsConfig?

Good, I renamed both.

bq. Not sure, but I think that SimpleFacetFields adds the dimension's ordinal 
if the dim is both hierarchical and multi-valued? That's a step back from the 
default ALL_BUT_DIM that we have today. I think we might want to have a 
requiresDimValue or something, because I do think the dimension's value (count) 
is most often unneeded, and it's a waste to encode its ordinal?

It does, and I think that's OK?  Yes, it's one extra ord it indexes,
but that will be a minor perf hit since it's a multi-valued and
hierarchical.

bq. Constants isn't documented, not sure if 'precommit' will like that, but in 
general I think the constants should have jdocs. Maybe put a nocommit?

I put nocommits.  Sadly, precommit won't fail (lucene/facet is
"missing" in document-lint target in build.xml, because we never fixed
all its javadocs).  We should separately fix that.

bq. TaxonomyFacetCounts.getAllDims() – If a dimension is not hierarchical, I 
think SimpleFacetResult.count == 0? In that case, sorting by its count is 
useless?  I think it's also relevant for SortedSet?

I THINK these will work correctly (we sum the dim value as we visit
all children), but I was missing the "corrected" dim count in the
hierarchical + MV case so I added that.  Also, I put nocommits to test
this.

bq. LabelAndValue has a Number as the value rather than double. I see this 
class is used only in the final step (labeling), but what's wrong w/ the 
previous 'double' primitive? Is it to avoid casting or something?

I think it's crazy when I ask for counts that I get a double back :)
That's why I switched it to a Number.

bq. CategoryListIterator

I appreciate the usefulness of this API, but rather that adding in
into "simple" from the get-go, I'd like to build out the different
facet methods and understand if it's actually useful / worth the
additional abstraction.

For example, I'm not sure it would work very well with SSDV, since we
first count in seg-ord space an then convert to global-ord space only
when combining counts across segments (this gave better performance).
I mean, yes, it would "work" in that the abstraction would be correct,
but we'd be paying a performance penalty.

bq. E.g. that's how we were able to test fast cutting over facets to DV from 
Payload, that's a nice abstraction for letting you load the values from a 
cache, and so forth.

I think doing such future tests with the simple APIs will still be
easy; I don't think we should open up abstractions for the
encoding/decoding today.

bq. FacetsAggregator

I added another facet method, TaxonomyFacetSumValueSource for value
source aggregation, in the next patch, to explore this need...

bq. But I don't see this convenience getting away (you'll just pass a 
List<FacetsAggregator> and then pull the requested values later on). 

True but ... we need some base class / interface that all these
*Facets.java implement ... I haven't done that yet (it's a TODO).

bq. What I like less about it is that it folds in the logic coded by 
FacetsAccumulator and FacetResultsHandler

Maybe we can move some of these methods into the base class?  I'm not
sure though... since for the TaxonomyFacetSumValueSource it's float[]
values and for the *Count it's int[] counts.

At the end of the day, the code that does the facet counting, the
rollup, pulling the topK, is in fact a small amount of code; I think
replicating bits of this code for the different methods is the lesser
evil than adding so many abstractions that the module is hard to
approach by new devs/users.

bq. Has variants that return a full sub-tree (TopKInEachNodeHandler)

That handler is such an exotic use case ... and the app can just
recurse itself, calling TaxoFacetCounts.getTopChildren?

bq. FacetArrays

We avoid the need to factor this out, by simply enumerating the facet
impls directly.  E.g. we (or a user) can make a TaxoFacetAvgScore that
allocates its own int[] and float[]...

bq.  I do think this class has some value today, e.g. in CategoryListParams 
(which you nuked), the app doesn't need to know about the field under which 
facets are indexed at all, only if it cares to change it or index multiple 
category lists.

The facet field is now optional (defaults to $facets =
Constants.DEFAULT_FACET_FIELD); the TestXXX.testBasic now look quite
simple.

bq. let's put the configuration parameters under it (e.g. DimType and 
drillDownDelimiter)?

I added nocommits; I think it's a good idea to move delim char inside
here; we'd need to enforce that all fields sharing the same indexed
field have the same delim char, but I think that's fine.

What do you mean by DimType?  Taxo vs SSDV?  That's interesting to put
in the DimConfig... it'd mean we (almost?) could pick the right
accum/agg/results impl at search time.


> Simplify the facet module APIs
> ------------------------------
>
>                 Key: LUCENE-5339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5339
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch, LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to