Not all of your fields might be strings Sent from my iPhone
> On Oct 23, 2023, at 1:10 PM, Greg Miller <gsmil...@gmail.com> wrote: > > Hey Michael- > > You've gotten a lot of great information here already. I'll point you to > one more implementation as well: StringValueFacetCounts. This > implementation lets you do faceting over arbitrary "string-like" doc value > fields (SORTED and SORTED_SET). So if you already have a field of this type > you're using for other purposes, and you want to do faceting over it, you > can do it with this implementation. > > The faceting-specific fields (there's a taxonomy-based approach and a > non-taxonomy-based approach, both with pros/cons) are also available, which > is what you've referenced here so far (and what others have pointed you > to). These are more "managed" fields with faceting in mind. > > A high-level difference here is that faceting-specific fields tend to index > all the facet fields into a single doc values field in the index, which can > make faceting more efficient. StringValueFacetCounts can be less efficient > for faceting (if you have many different fields you want to individually > facet) but could be more flexible for you if you already have these fields > in your index for other purposes and don't want to duplicate the data into > these facet-specific fields. > > Not sure if these details are helpful for you or not. If any of this is a > bit unclear, let me know and I'll try to describe things better or answer > specific questions. Honestly, we probably have too many ways to do the same > thing in the faceting module, and maybe our documentation could be a bit > more helpful. > > Cheers, > -Greg > >> On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner <michael.wech...@wyona.com> >> wrote: >> >> thanks very much for this additional information, Marc! >> >>> Am 20.10.23 um 20:30 schrieb Marc D'Mello: >>> Just following up on Mike's comment: >>> >>> >>>> It used to be that the "doc values" based faceting did not support >>>> >>> arbitrary hierarchy, but I think that was fixed at some point. >>> >>> >>> Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField >> supports >>> hierarchical faceting, I think you just need to enable it in the >>> FacetsConfig. One thing to keep in mind is even though SSDV faceting >>> doesn't require a taxonomy index, it still requires a >>> SortedSetDocValuesReaderState to be maintained, which can be a little bit >>> expensive to create, but only needs to be done once. This benchmark code >>> < >> https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java >>> >>> serves as a pretty basic example of SSDV/hierarchical SSDV faceting. >>> >>> On Fri, Oct 20, 2023 at 7:09 AM Michael Wechner < >> michael.wech...@wyona.com> >>> wrote: >>> >>>> cool, thank you very much! >>>> >>>> Michael >>>> >>>> >>>> >>>> Am 20.10.23 um 15:44 schrieb Michael McCandless: >>>>> You can use either the "doc values" implementation for facets >>>>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation >>>>> (FacetField, in which case, yes, you need to create a TaxonomyWriter). >>>>> >>>>> It used to be that the "doc values" based faceting did not support >>>>> arbitrary hierarchy, but I think that was fixed at some point. >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> >>>>> On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner < >>>> michael.wech...@wyona.com> >>>>> wrote: >>>>> >>>>>> Hi Mike >>>>>> >>>>>> Thanks for your feedback! >>>>>> >>>>>> IIUC in order to have the actual advantages of Facets one has to >>>>>> "connect" it with a TaxonomyWriter >>>>>> >>>>>> FacetsConfig config = new FacetsConfig(); >>>>>> DirectoryTaxonomyWriter taxoWriter = new >>>> DirectoryTaxonomyWriter(taxoDir); >>>>>> indexWriter.addDocument(config.build(taxoWriter, doc)); >>>>>> >>>>>> right? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Michael >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Am 20.10.23 um 12:19 schrieb Michael McCandless: >>>>>>> There are some differences. >>>>>>> >>>>>>> StringField is indexed into the inverted index (postings) so you can >> do >>>>>>> efficient filtering. You can also store in stored fields to >> retrieve. >>>>>>> >>>>>>> FacetField does everything StringField does (filtering, storing >>>>>> (maybe?)), >>>>>>> but in addition it stores data for faceting. I.e. you can compute >>>> facet >>>>>>> counts or simple aggregations at search time. >>>>>>> >>>>>>> FacetField is also hierarchical: you can filter and facet by >> different >>>>>>> points/levels of your hierarchy. >>>>>>> >>>>>>> Mike McCandless >>>>>>> >>>>>>> http://blog.mikemccandless.com >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner < >>>>>> michael.wech...@wyona.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> I have found the following simple Facet Example >>>>>>>> >>>>>>>> >>>>>>>> >>>> >> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java >>>>>>>> whereas for a simple categorization of documents I currently use >>>>>>>> StringField, e.g. >>>>>>>> >>>>>>>> doc1.add(new StringField("category", "book")); >>>>>>>> doc1.add(new StringField("category", "quantum_physics")); >>>>>>>> doc1.add(new StringField("category", "Neumann")) >>>>>>>> doc1.add(new StringField("category", "Wheeler")) >>>>>>>> >>>>>>>> doc2.add(new StringField("category", "magazine")); >>>>>>>> doc2.add(new StringField("category", "astro_physics")); >>>>>>>> >>>>>>>> which works well, but would it be better to use Facets for this, >> e.g. >>>>>>>> >>>>>>>> doc1.add(new FacetField("media-type", "book")); >>>>>>>> doc1.add(new FacetField("topic", "physics", "quantum"); >>>>>>>> doc1.add(new FacetField("author", "Neumann"); >>>>>>>> doc1.add(new FacetField("author", "Wheeler"); >>>>>>>> >>>>>>>> doc1.add(new FacetField("media-type", "magazine")); >>>>>>>> doc1.add(new FacetField("topic", "physics", "astro"); >>>>>>>> >>>>>>>> ? >>>>>>>> >>>>>>>> IIUC the StringField approach is more general, whereas the >> FacetField >>>>>>>> approach allows to do a more specific categorization / search. >>>>>>>> Or do I misunderstand this? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org