Changes to the Facet API to support this new feature.
Current API:
struct Facet {
1:string queryStr,
2:i64 minimumNumberOfBlurResults = 9223372036854775807
}
struct BlurQuery {
...
3:list<Facet> facets,
...
}
struct BlurResults {
...
4:list<i64> facetCounts,
...
}
Changed API:
enum FacetType {
QUERY,
TERM_ENUM
}
struct Facet {
1:string queryStr,
2:i64 minimumNumberOfBlurResults = 9223372036854775807,
3:FacetType type = QUERY //Facet type
}
struct BlurQuery {
...
3:map<string,Facet> facets, //Named facets
...
}
struct FacetResult {
1:i64 count, //For the standard query facet
2:map<string,i64> termCounts //For the term enum type
}
struct BlurResults {
...
4:map<string,FacetResult> facetResults, //Named results
...
}
Thoughts?
On Thu, Sep 27, 2012 at 2:39 PM, Garrett Barton <[email protected]>wrote:
> I too want similar functionality. The first thing I would like to see is a
> simple ordered list of all terms in a field with counts returned. This
> would be enabled I think through the analyzer definition at index creation
> time probably. Make someone conciously decide they want to take the
> calculation hit instead of putting the load on the shard servers. Also
> isn't it faster right now to just execute aditional queries and use the hit
> counts than load up one with the facets?
> The second thing is not faceting directly I just happen to be using it with
> facets all the time. I like to try and find the distinct values (and their
> counts) of a field for a given query for filtering. Right now I plow
> through some multiple of the results I return to try and get a mostly
> complete list of terms, this is obviously not the complete list. Is there
> a way to get that list or make an API call to let me send that query to the
> shards?
>
> Thanks for listening!
> Garrett
>
> On Thursday, September 27, 2012, Aaron McCurry <[email protected]> wrote:
> > Yep. We can build it, but I think there needs to be some limits placed
> on
> > how many terms can be enumerated on. I would hate to have someone pick
> an
> > primary key field to enumerate on and blow up the server. I think that
> > easiest way to do it would be to expand the terms in the field on the
> shard
> > server and run the current faceting query on those expanded terms. I
> think
> > that is the easy part. The hard part is going to be how we modify the
> > facet api in thrift to accept the new facet type and how to return the
> > facet results. How would you want the result api to look?
> >
> > Aaron
> >
> > On Thu, Sep 27, 2012 at 1:27 PM, Tim Williams <[email protected]>
> wrote:
> >
> >> On Tue, Sep 18, 2012 at 10:42 AM, Aaron McCurry <[email protected]>
> >> wrote:
> >> > In the BlurQuery object, add Facet objects to the facet list. Where
> the
> >> > Facet object contains the query that you want to facet on for example:
> >> >
> >> > bq = new BlurQuery();
> >> > bq.addFacet(new Facet("tweets.text:hadoop", Long.MAX_VALUE); // where
> the
> >> > long is the minimum number results in the facet to return.
> >> > // So if the value was set to 10, the facet object would stop counting
> >> the
> >> > facet at 10. Note: It's very likely that you will get more than your
> >> > minimum back.
> >> >
> >> > results = client.query("table",bq);
> >> > List<Long> counts = results.getFacetCounts();
> >> > long hadoopCount = counts.get(0); // The index of the results will
> match
> >> > the index of the facet object that where in the query.
> >> >
> >> > Hope this helps, let me know if you have anymore questions.
> >>
> >> Thanks it does. I'm in need of the other kind of faceting, where a
> >> facet is essentially the distinct values for a field relative to a
> >> given query. Something like Solr's Enum-Based Field Faceting[1]. Any
> >> pointers for how I could implement that inside Blur? The only thing I
> >> can come up with is outside blur and seems inefficient - essentially
> >> record distinct values for the fields of interest at ingest time; then
> >> use those values in Blur's existing facetquery to get the counts. I'm
> >> guessing there's a better approach?
> >>
> >> Thanks,
> >> --tim
> >>
> >> [1] - http://wiki.apache.org/solr/SolrFacetingOverview
> >>
> >
>