Thank you so much, Shai... Chitra
On Wed, Nov 30, 2016 at 2:17 PM, Shai Erera <ser...@gmail.com> wrote: > This feature is not available in Lucene currently, but it shouldn't be hard > to add it. See Mike's comment here: > http://blog.mikemccandless.com/2013/05/dynamic-faceting- > with-lucene.html?showComment=1412777154420#c363162440067733144 > > One more tricky (yet nicer) feature would be to have it all in one go, i.e. > you'd say something like "facet on field price" and you'd get "interesting" > buckets, per the variance in the results. > > But before that, we could have a StatsFacets in Lucene which provide some > statistics about a numeric field (min/max/avg etc.). > > On Wed, Nov 30, 2016 at 7:50 AM Chitra R <chithu.r...@gmail.com> wrote: > > > Thank you so much, mike... Hope, gained a lot of stuff on Doc > > Values faceting and also clarified all my doubts. Thanks..!! > > > > > > *Another use case:* > > > > After getting matching documents for the given query, Is there any way to > > calculate mix and max values on NumericDocValuesField ( say date field)? > > > > > > I would like to implement it in numeric range faceting by splitting the > > numeric values (getting from resulted documents) into ranges. > > > > > > Chitra > > > > > > On Wed, Nov 30, 2016 at 3:51 AM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > > > Doc values fields are never loaded into memory; at most some small > > > index structures are. > > > > > > When you use those fields, the bytes (for just the one doc values > > > field you are using) are pulled from disk, and the OS will cache them > > > in memory if available. > > > > > > Mike McCandless > > > > > > http://blog.mikemccandless.com > > > > > > > > > On Mon, Nov 28, 2016 at 6:01 AM, Chitra R <chithu.r...@gmail.com> > wrote: > > > > Hi, > > > > When opening SortedSetDocValuesReaderState at search time, > > > whether > > > > the whole doc value files (.dvd & .dvm) information are loaded in > > memory > > > or > > > > specified field information(say $facets field) alone load in memory? > > > > > > > > > > > > > > > > > > > > Any help is much appreciated. > > > > > > > > > > > > Regards, > > > > Chitra > > > > > > > > On Tue, Nov 22, 2016 at 5:47 PM, Chitra R <chithu.r...@gmail.com> > > wrote: > > > >> > > > >> > > > >> Kindly post your suggestions. > > > >> > > > >> Regards, > > > >> Chitra > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Sat, Nov 19, 2016 at 1:38 PM, Chitra R <chithu.r...@gmail.com> > > > wrote: > > > >>> > > > >>> Hey, I got it clearly. Thank you so much. Could you please help us > to > > > >>> implement it in our use case? > > > >>> > > > >>> > > > >>> In our case, we are having dynamic index and it is variable depth > > too. > > > So > > > >>> flat facet is enough.No need of hierarchical facets. > > > >>> > > > >>> What I think is, > > > >>> > > > >>> Index my facet field as normal doc value field, so that no special > > > >>> operation (like taxonomy and sorted set doc values facet field) > will > > > be done > > > >>> at index time and only doc value field stores its ordinals in their > > > >>> respective field. > > > >>> At search time, I will pass query (user search query) , filter > (path > > > >>> traversed list) and collect the matching documents in > > Facetscollector. > > > >>> To compute facet count for the specific field, I will gather those > > > >>> resulted docs, then move through each segment for collecting the > > > matching > > > >>> ordinals using AtomicReader. > > > >>> > > > >>> > > > >>> And know when I use this means, can't calculate facet count for > more > > > than > > > >>> one field(facet) in a search. > > > >>> > > > >>> Instead of loading all the dimensions in DocValuesReaderState (will > > > take > > > >>> more time and memory) at search time, loading specific fields will > > > take less > > > >>> time and memory, hope so. Kindly help to solve. > > > >>> > > > >>> > > > >>> It will do it in a minimal index and search cost, I think. And hope > > > this > > > >>> won't put overload at index time, also at search time this will be > > > better. > > > >>> > > > >>> > > > >>> Kindly post your suggestions. > > > >>> > > > >>> > > > >>> Regards, > > > >>> Chitra > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Fri, Nov 18, 2016 at 7:15 PM, Michael McCandless > > > >>> <luc...@mikemccandless.com> wrote: > > > >>>> > > > >>>> I think you've summed up exactly the differences! > > > >>>> > > > >>>> And, yes, it would be possible to emulate hierarchical facets on > top > > > >>>> of flat facets, if the hierarchy is fixed depth like > year/month/day. > > > >>>> > > > >>>> But if it's variable depth, it's trickier (but I think still > > > >>>> possible). See e.g. the Committed Paths drill-down on the left, > on > > > >>>> our dog-food server > > > >>>> http://jirasearch.mikemccandless.com/search.py?index=jira > > > >>>> > > > >>>> Mike McCandless > > > >>>> > > > >>>> http://blog.mikemccandless.com > > > >>>> > > > >>>> > > > >>>> On Fri, Nov 18, 2016 at 1:43 AM, Chitra R <chithu.r...@gmail.com> > > > wrote: > > > >>>> > case 1: > > > >>>> > In taxonomy, for each indexed document, examines facet > > > label , > > > >>>> > computes their ordinals and mappings, and which will be stored > in > > > >>>> > sidecar > > > >>>> > index at index time. > > > >>>> > > > > >>>> > case 2: > > > >>>> > In doc values, these(ordinals) are computed at search > > time, > > > so > > > >>>> > there > > > >>>> > will be a time and memory trade-off between both cases, hope so. > > > >>>> > > > > >>>> > > > > >>>> > In taxonomy, building hierarchical facets at index time makes > > > faceting > > > >>>> > cost > > > >>>> > minimal at search time than flat facets in doc values. > > > >>>> > > > > >>>> > Except (memory,time and NRT latency) , Is any another contrast > > > between > > > >>>> > hierarchical and flat facets at search time? > > > >>>> > > > > >>>> > > > > >>>> > Kindly post your suggestions... > > > >>>> > > > > >>>> > > > > >>>> > Regards, > > > >>>> > Chitra > > > >>>> > > > > >>>> > On Thu, Nov 17, 2016 at 6:40 PM, Chitra R < > chithu.r...@gmail.com> > > > >>>> > wrote: > > > >>>> >> > > > >>>> >> Okay. I agree with you, Taxonomy maintains and supports > > > hierarchical > > > >>>> >> facets during indexing. Hope hierarchical in the sense, we > might > > > >>>> >> index the > > > >>>> >> field Publish date : 2010/10/15 as Publish date: 2010 , Publish > > > date: > > > >>>> >> 2010/10 and Publish date: 2010/10/15 , their facet ordinals are > > > >>>> >> maintained > > > >>>> >> in sidecar index and it is mapped to the main index. > > > >>>> >> > > > >>>> >> For example: > > > >>>> >> > > > >>>> >> In search-lucene.com , I enter a term (say > > facet), > > > >>>> >> top > > > >>>> >> documents and their categories are displayed after performing > the > > > >>>> >> search. > > > >>>> >> Say I drill down through Publish date/2010 to collect its child > > > >>>> >> counts and > > > >>>> >> after I will pass through publishdate/2010/10 to collect their > > > child > > > >>>> >> counts. > > > >>>> >> And for each drill down, each search will be performed to > collect > > > its > > > >>>> >> top > > > >>>> >> docs and categories. > > > >>>> >> > > > >>>> >> > > > >>>> >> Even I can achieve this in flat facets by > changing > > > the > > > >>>> >> drill down query. > > > >>>> >> > > > >>>> >> Am I right or missed anything? yet I don't know if I missed > > > >>>> >> anything... > > > >>>> >> > > > >>>> >> So What is the need of hierarchical facets? Could you please > > > explain > > > >>>> >> it(hierarchical facets) in the real-world use case? > > > >>>> >> > > > >>>> >> > > > >>>> >> Regards, > > > >>>> >> Chitra > > > >>>> >> > > > >>>> >> On Wed, Nov 16, 2016 at 7:36 PM, Michael McCandless > > > >>>> >> <luc...@mikemccandless.com> wrote: > > > >>>> >>> > > > >>>> >>> You store dimension + string (a single value path, since it's > > not > > > >>>> >>> hierarchical) into SSDVFF so that you can compute facet > counts, > > > >>>> >>> either > > > >>>> >>> ordinary drill down counts or the drill sideways counts. > > > >>>> >>> > > > >>>> >>> You can see examples of drill sideways at > > > >>>> >>> http://jirasearch.mikemccandless.com, e.g. drill down on any > of > > > >>>> >>> those > > > >>>> >>> fields on the left and you don't lose the previous facet > counts > > > for > > > >>>> >>> that field. > > > >>>> >>> > > > >>>> >>> Mike McCandless > > > >>>> >>> > > > >>>> >>> http://blog.mikemccandless.com > > > >>>> >>> > > > >>>> >>> > > > >>>> >>> On Wed, Nov 16, 2016 at 8:51 AM, Chitra R < > > chithu.r...@gmail.com> > > > >>>> >>> wrote: > > > >>>> >>> > Hi, > > > >>>> >>> > > > > >>>> >>> > Lucene-Drill sideways > > > >>>> >>> > > > > >>>> >>> > jira_issue:LUCENE-4748 > > > >>>> >>> > > > > >>>> >>> > Is this the reason( ie > Drill > > > >>>> >>> > sideways > > > >>>> >>> > makes > > > >>>> >>> > a very nice faceted search UI because we > > > >>>> >>> > don't "lose" the facet counts after drilling in) behind > > storing > > > >>>> >>> > path > > > >>>> >>> > and > > > >>>> >>> > dimension for the given SSDVF field? Else anything? > > > >>>> >>> > > > > >>>> >>> > Regards, > > > >>>> >>> > Chitra > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > Hey, thank you so much for the fast response, I agree > NRT > > > >>>> >>> > refresh > > > >>>> >>> > is > > > >>>> >>> > somewhat costly operations and this is the major pitfall, > > > suppose > > > >>>> >>> > we > > > >>>> >>> > use doc > > > >>>> >>> > value faceting. > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > While indexing > SortedSetDocValuesFacetField , > > > it > > > >>>> >>> > stores > > > >>>> >>> > path and dimension of the given field internally. So Can we > > > >>>> >>> > achieve > > > >>>> >>> > hierarchical facets using DrillDownQuery? Hope, purpose of > > > storing > > > >>>> >>> > path > > > >>>> >>> > and > > > >>>> >>> > dimension is to achieve hierarchical facets. If yes (ie we > can > > > >>>> >>> > achieve > > > >>>> >>> > hierarchy in SSDVFF) , so what is the need to move over > > > taxonomy? > > > >>>> >>> > Else I missed anything? > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > What is the real purpose to store path and > > > >>>> >>> > dimension > > > >>>> >>> > in > > > >>>> >>> > SSDVF field? > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > Kindly post your suggestions. > > > >>>> >>> > > > > >>>> >>> > Regards, > > > >>>> >>> > Chitra > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >>> > On Sat, Nov 12, 2016 at 4:03 AM, Michael McCandless > > > >>>> >>> > <luc...@mikemccandless.com> wrote: > > > >>>> >>> >> > > > >>>> >>> >> On Fri, Nov 11, 2016 at 5:21 AM, Chitra R < > > > chithu.r...@gmail.com> > > > >>>> >>> >> wrote: > > > >>>> >>> >> > > > >>>> >>> >> > i)Hope, when opening > SortedSetDocValuesReaderState > > , > > > we > > > >>>> >>> >> > are > > > >>>> >>> >> > calculating ordinals( this will be used to calculate > facet > > > >>>> >>> >> > count ) > > > >>>> >>> >> > for > > > >>>> >>> >> > doc > > > >>>> >>> >> > values field and this only made the state instance > somewhat > > > >>>> >>> >> > costly. > > > >>>> >>> >> > Am I right or any other reason > behind > > > >>>> >>> >> > that? > > > >>>> >>> >> > > > >>>> >>> >> That's correct. It adds some latency to an NRT refresh, > and > > > some > > > >>>> >>> >> heap > > > >>>> >>> >> used to hold the ordinal mappings. > > > >>>> >>> >> > > > >>>> >>> >> > ii) During indexing, we are providing facet > > ordinals > > > >>>> >>> >> > in > > > >>>> >>> >> > each > > > >>>> >>> >> > doc > > > >>>> >>> >> > and I think it will be useful in search side, to > calculate > > > >>>> >>> >> > facet > > > >>>> >>> >> > counts > > > >>>> >>> >> > only for matching docs. otherwise, it carries any other > > > >>>> >>> >> > benefits? > > > >>>> >>> >> > > > >>>> >>> >> Well, compared to the taxonomy facets, SSDV facets don't > > > require > > > >>>> >>> >> a > > > >>>> >>> >> separate index. > > > >>>> >>> >> > > > >>>> >>> >> But they add latency/heap usage, and they cannot do > > > hierarchical > > > >>>> >>> >> facets yet (though this could be fixed if someone just > built > > > it). > > > >>>> >>> >> > > > >>>> >>> >> > iii) Is SortedSetDocValuesReaderState > thread-safe > > > (ie) > > > >>>> >>> >> > multiple > > > >>>> >>> >> > threads can call this method concurrently? > > > >>>> >>> >> > > > >>>> >>> >> Yes. > > > >>>> >>> >> > > > >>>> >>> >> Mike McCandless > > > >>>> >>> >> > > > >>>> >>> >> http://blog.mikemccandless.com > > > >>>> >>> > > > > >>>> >>> > > > > >>>> >> > > > >>>> >> > > > >>>> > > > > >>> > > > >>> > > > >> > > > > > > > > > >