Not all of your fields might be strings

Sent from my iPhone

> On Oct 23, 2023, at 1:10 PM, Greg Miller <gsmil...@gmail.com> wrote:
> 
> Hey Michael-
> 
> You've gotten a lot of great information here already. I'll point you to
> one more implementation as well: StringValueFacetCounts. This
> implementation lets you do faceting over arbitrary "string-like" doc value
> fields (SORTED and SORTED_SET). So if you already have a field of this type
> you're using for other purposes, and you want to do faceting over it, you
> can do it with this implementation.
> 
> The faceting-specific fields (there's a taxonomy-based approach and a
> non-taxonomy-based approach, both with pros/cons) are also available, which
> is what you've referenced here so far (and what others have pointed you
> to). These are more "managed" fields with faceting in mind.
> 
> A high-level difference here is that faceting-specific fields tend to index
> all the facet fields into a single doc values field in the index, which can
> make faceting more efficient. StringValueFacetCounts can be less efficient
> for faceting (if you have many different fields you want to individually
> facet) but could be more flexible for you if you already have these fields
> in your index for other purposes and don't want to duplicate the data into
> these facet-specific fields.
> 
> Not sure if these details are helpful for you or not. If any of this is a
> bit unclear, let me know and I'll try to describe things better or answer
> specific questions. Honestly, we probably have too many ways to do the same
> thing in the faceting module, and maybe our documentation could be a bit
> more helpful.
> 
> Cheers,
> -Greg
> 
>> On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner <michael.wech...@wyona.com>
>> wrote:
>> 
>> thanks very much for this additional information, Marc!
>> 
>>> Am 20.10.23 um 20:30 schrieb Marc D'Mello:
>>> Just following up on Mike's comment:
>>> 
>>> 
>>>> It used to be that the "doc values" based faceting did not support
>>>> 
>>> arbitrary hierarchy, but I think that was fixed at some point.
>>> 
>>> 
>>> Yeah it was fixed a year or two ago, SortedSetDocValuesFacetField
>> supports
>>> hierarchical faceting, I think you just need to enable it in the
>>> FacetsConfig. One thing to keep in mind is even though SSDV faceting
>>> doesn't require a taxonomy index, it still requires a
>>> SortedSetDocValuesReaderState to be maintained, which can be a little bit
>>> expensive to create, but only needs to be done once. This benchmark code
>>> <
>> https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/facets/BenchmarkFacets.java
>>> 
>>> serves as a pretty basic example of SSDV/hierarchical SSDV faceting.
>>> 
>>> On Fri, Oct 20, 2023 at 7:09 AM Michael Wechner <
>> michael.wech...@wyona.com>
>>> wrote:
>>> 
>>>> cool, thank you very much!
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> Am 20.10.23 um 15:44 schrieb Michael McCandless:
>>>>> You can use either the "doc values" implementation for facets
>>>>> (SortedSetDocValuesFacetField), or the "taxonomy" implementation
>>>>> (FacetField, in which case, yes, you need to create a TaxonomyWriter).
>>>>> 
>>>>> It used to be that the "doc values" based faceting did not support
>>>>> arbitrary hierarchy, but I think that was fixed at some point.
>>>>> 
>>>>> Mike McCandless
>>>>> 
>>>>> http://blog.mikemccandless.com
>>>>> 
>>>>> 
>>>>> On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner <
>>>> michael.wech...@wyona.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Mike
>>>>>> 
>>>>>> Thanks for your feedback!
>>>>>> 
>>>>>> IIUC in order to have the actual advantages of Facets one has to
>>>>>> "connect" it with a TaxonomyWriter
>>>>>> 
>>>>>> FacetsConfig config = new FacetsConfig();
>>>>>> DirectoryTaxonomyWriter taxoWriter = new
>>>> DirectoryTaxonomyWriter(taxoDir);
>>>>>> indexWriter.addDocument(config.build(taxoWriter, doc));
>>>>>> 
>>>>>> right?
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Am 20.10.23 um 12:19 schrieb Michael McCandless:
>>>>>>> There are some differences.
>>>>>>> 
>>>>>>> StringField is indexed into the inverted index (postings) so you can
>> do
>>>>>>> efficient filtering.  You can also store in stored fields to
>> retrieve.
>>>>>>> 
>>>>>>> FacetField does everything StringField does (filtering, storing
>>>>>> (maybe?)),
>>>>>>> but in addition it stores data for faceting.  I.e. you can compute
>>>> facet
>>>>>>> counts or simple aggregations at search time.
>>>>>>> 
>>>>>>> FacetField is also hierarchical: you can filter and facet by
>> different
>>>>>>> points/levels of your hierarchy.
>>>>>>> 
>>>>>>> Mike McCandless
>>>>>>> 
>>>>>>> http://blog.mikemccandless.com
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner <
>>>>>> michael.wech...@wyona.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi
>>>>>>>> 
>>>>>>>> I have found the following simple Facet Example
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>> 
>> https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java
>>>>>>>> whereas for a simple categorization of documents I currently use
>>>>>>>> StringField, e.g.
>>>>>>>> 
>>>>>>>> doc1.add(new StringField("category", "book"));
>>>>>>>> doc1.add(new StringField("category", "quantum_physics"));
>>>>>>>> doc1.add(new StringField("category", "Neumann"))
>>>>>>>> doc1.add(new StringField("category", "Wheeler"))
>>>>>>>> 
>>>>>>>> doc2.add(new StringField("category", "magazine"));
>>>>>>>> doc2.add(new StringField("category", "astro_physics"));
>>>>>>>> 
>>>>>>>> which works well, but would it be better to use Facets for this,
>> e.g.
>>>>>>>> 
>>>>>>>> doc1.add(new FacetField("media-type", "book"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "quantum");
>>>>>>>> doc1.add(new FacetField("author", "Neumann");
>>>>>>>> doc1.add(new FacetField("author", "Wheeler");
>>>>>>>> 
>>>>>>>> doc1.add(new FacetField("media-type", "magazine"));
>>>>>>>> doc1.add(new FacetField("topic", "physics", "astro");
>>>>>>>> 
>>>>>>>> ?
>>>>>>>> 
>>>>>>>> IIUC the StringField approach is more general, whereas the
>> FacetField
>>>>>>>> approach allows to do a more specific categorization / search.
>>>>>>>> Or do I misunderstand this?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> Michael
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>> 
>>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to